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Abstract 

Most  of  the  work  to  date  on  automated  control-knowledge  acquisition  has  been  aimed  at  improving  the 
efficiency  of  planning,  this  work  has  been  termed  “speed-up  learning”.  The  work  presented  here  focuses 
on  the  automated  acquisition  of  control  knowledge  to  guide  a  planner  towards  better  solutions,  i.e.  to 
improve  the  quality  of  plans  produced  by  the  planner,  as  its  problem  solving  experience  increases.  To  date 
no  work  has  focused  on  automatically  acquiring  knowledge  to  improve  plan  quality  in  planning  systems. 
We  present  a  taxonomy  of  plan  quality  metrics  and  a  first  prototype  that  partially  automates  the  task  of 
acquiring  quality-enhancing  control  knowledge  for  the  I’RODttJY  nonlinear  planner.  We  ate  working  on 
testing  the  effect  of  such  control  knowledge  in  plan  quality,  and  developing  methods  to  learn  .such  control 
knowledge.  Two  complex  domains,  namely  a  transportation  logistics  domain,  and  a  machining  process 
planning  domain,  are  being  used  to  evaluate  these  ideas. 
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1  hitroduction 


The  focus  of  this  work  is  on  the  automated  acquisition  of  search  control  knowledge  for  planning  systems. 
Most  of  the  work  to  date  on  automated  control-knowledge  acquisition  has  been  aimed  to  improve  the 
efficiency  of  planning,  this  work  has  been  termed  speed-up  learning.  Our  focus  is  on  the  acquisition 
of  control  knowledge  to  guide  the  planner  towards  better  solutions,  i.e.  to  improve  the  quality  of  plans 
produC'*d  by  the  planner. 

Tlicrc  arc  many  variations  on  the  notion  of  the  quality  of  a  plan  in  the  context  of  classical  planning 
systems  [Langley  and  Drummond,  19901,  .such  as; 

•  the  length  of  the  solution  path  or  the  total  number  of  actions 

•  the  execution  time  of  the  plan 

•  the  energy,  or  other  resources,  required  for  the  plan  execution 

•  the  robustness  of  the  plan,  or  its  ability  to  respond  well  under  changing  or  uncertain  conditions. 

Human  experts  gather  knowledge  for  producing  better  plans  through  experience.  Here  “better”  is 
defined  in  a  context-sensitive  manner  as  a  combination  of  plan-quality  factors  such  as  those  listed  above. 
It  is  precisely  this  experiential  knowledge  that  we  seek  to  capture  from  planning  experience.  We  propose 
here  to  focus  our  attention  primarily  on  acquiring  control  knowledge  to  guide  the  planner’s  search  towards 
better  solutions  during  planning,  rather  than  post-facto  plan  modification.  The  framework  lor  this  work  is 
the  PRODIGY  architecture. 

The  rest  of  this  section  gives  an  e.^ample  of  plan  quality  in  a  particular  domain.  Section  2  proposes 
a  taxonomy  of  plan  quality  metrics.  Section  3  explores  the  relationship  between  plan  quality  and  goal 
interactions.  Section  4  gives  some  background  on  PRODIGY,  the  example  domains  u.sed  in  this  work,  and 
how  prodigy’s  search  decisions  affect  plan  quality.  Section  5  describes  an  implemented  prototype  for 
semi -automated  acquisition  of  control  rules  to  improve  plan  quality  in  PRODIGY.  Section  6  briclly  prc.scnts 
work  in  progress  for  learning  quality-enhancing  control  rules.  Section  7  analyzes  related  work,  and  Section  8 
concludes  with  a  summary  of  the  expected  contributions  of  this  work  '. 

1.1  An  Example:  Plan  Quality  in  a  Process  Planning  Domain 

In  the  process  planning  phase  of  production  manufacturing  plan  quality  is  crucial  in  order  to  minimize 
both  resource  consumption  and  execution  time.  The  goal  of  process  planning  is  to  produce  plans  for 
machining  parts  given  their  specifications.  Such  planning  requires  taking  into  account  both  technological 
and  economical  considerations  [Dcscotte  and  Latombe,  1985,  Doyle,  1969],  for  instance: 

•  It  may  be  advantageous  to  execute  several  cuts  on  the  same  machine  with  the  same  fixing  to  reduce 
the  time  spent  .setting  up  the  work  on  the  machines. 

•  If  a  hole  II 1  opens  into  another  hole  lli,  then  one  is  recommended  machining  Hi  before  II \  in  order 
to  avoid  the  risk  of  damaging  the  drill. 


‘a  slightly  different  version  of  this  document  was  submitted  as  a  thesis  proposal  in  ihe  School  of  Computer  Science,  Carnegie 
Mellon,  in  December  1992. 


Figure  1:  An  Example  of  Set-Up  in  the  Machining  Domain  (from  [Joseph,  19921).  In  this  example  ihe 
holding  device  is  a  vise,  the  machine  a  drilling  machine,  and  the  tool  a  drill-bit. 

Most  of  these  considerations  are  not  pure  constraints  but  only  preferences  when  compromi.ses  arc 
necessary.  They  often  represent  both  the  experience  artd  the  know-how  of  engineers,  so  they  may  differ 
from  one  company  to  the  other. 

Let  us  look  at  a  concrete  example  of  the  difference  in  quality  of  plans  in  this  domain,  in  particular  in 
its  implementation  as  one  of  prodigy’s  domains  [Gil,  1991 1.  The  domain  concentrates  on  the  machining, 
joining,  and  finishing  steps  of  production  manufacturing.  The  goal  is  to  produce  one  or  more  parts  according 
to  certain  specifications.  An  example  of  a  request  would  be  for  a  rectangular  block  of  5”x2”xl”  made  of 
aluminum  and  with  a  centered  hole  of  diameter  1/32”  running  through  the  length  of  the  part.  In  order  to 
perform  an  operation  on  a  part,  the  part  has  to  be  secured  to  the  machine  tabic  with  a  holding  device,  and  in 
many  cases  the  part  has  to  be  clean  and  without  burrs  from  preceding  operations.  The  appropriate  tool  has  to 
be  selected  and  installed  in  the  machine  as  well.  As  an  example.  Figure  1  shows  a  machine  set-up  to  drill  a 
hole  in  a  part.  Figure  2  sketches  graphically  the  steps  to  produce  a  reamed  hole.  Before  performing  each  of 
these  .steps,  the  appropriate  tool  has  to  be  set  in  the  machine  spindle,  namely  a  spot-drill,  a  high-hclix-drill, 
and  a  reamer.  Then  some  holding  device  (a  vise  in  the  example)  has  to  be  put  on  the  machine,  and  the  pan 
has  to  be  held  by  the  holding  device. 


haa-spot  has-hole  is-reamed 


Figure  2:  The  Steps  to  Make  a  Reamed  Hv'e:  first  a  spot  hole  is  drilled  on  the  pan,  then  the  hole  itself  is 
made,  and  finally  the  hole  is  reamed.  For  each  of  these  the  operations  the  appropriate  tool  (respectively  a 
spot  drill,  some  appropriate  drill  bit,  or  a  reamer)  has  to  be  installed  in  the  drilling  machine. 

Suppose  the  planner  has  to  build  a  plan  to  have  a  part  with  two  reamed  holes  on  one  of  its  sides  If 
the  planner  works  on  making  each  hole  separately,  it  will  obtain  a  solution,  sketched  in  Figure  3(a)  (the 
operators  to  hold  the  part  arc  omitted).  This  solution  is  not  the  shortest  one  (and  in  this  domain  a  shorter 
solution  may  mean  a  faster  and  cheaper  way  to  produce  a  large  number  of  parts).  Some  steps  may  be 
eliminated  by  ^.ordering  the  operations.  Both  holes,  and  spot  holes  for  that  matter,  have  to  be  in  the  same 
side  and  may  be  made  with  the  same  tools.  Therefore  once  we  have  set  the  appropriate  tool  in  the  drill 
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spindle  and  held  the  part  on  the  machine  table,  the  operations  corresponding  to  both  holes  can  be  perlormed 
consecutively.  Figure  3(b)  shows  a  better  solution  to  the  problem.  In  this  e,saniplc  the  planner  obtains  the 
better  solution  by  interleaving  the  problem  goals.  In  PRODIGY  this  deeision  may  be  encoded  in  the  fomi  of 
a  search  control  rule. 


hole!  < 


put  tool  to  drill  a  spot  hole 
elean  the  part 
drill  spot  hole  for  holel 
change  tool  to  make  hole 
drill  holel 


change  tool  to  ream 
elean  the  part 
ream  holel 


hole2  < 


change  tool  to  drill  a  spot  hole 

drill  spot  Iwle  for  hole2 

change  tool  to  make  hole 

drill  hole2 

ehange  tool  to  ream 

elean  the  pan 

ream  hole2 


(a) 


put  tool  to  drill  a  spot  hole 

clean  the  part 

drill  spot  h(tlc  for  holel 

drill  spot  hole  for  hole2 

change  tottl  to  make  hole 

drill  hole  1 

drill  hole2 

change  tool  to  ream 

clean  the  part 

ream  hole  1 

clean  the  pan 

ream  hole2 


holel 

and 

hole2 


(b) 


Figure  3:  Two  Plans  of  Different  Quality  to  Make  Two  Reamed  Holes  on  a  Part.  Some  steps  in  solution 
(a)  may  be  eliminated  by  reordering  the  operations,  since  once  the  corresponding  tool  is  set,  the  operation 
may  be  performed  for  both  holes  consecutively.  Solution  (b)  captures  such  improvement  by  interleaving 
the  operations  on  holel  and  hole2. 


2  A  Taxoftomy  of  Quality  Metrics 

In  this  seetion  we  propose  a  taxonomy  of  quality  metrics  for  planning  systems.  These  metrics  can  be 
classified  in  three  large  groups.  The  planner  solutions  can  be  compared  in  terms  of  their  execution  cost, 
their  robustness  or  reliability  under  unexpected  circumstances,  and  the  satisfaction  of  the  client  with  the 
solution  itself  (for  example  the  aceuracy  of  the  result,  or  the  comfort  it  provides  to  the  user).  Figure  4 
presents  this  taxonomy,  and  the  next  subsections  explore  it  in  detail.  The  problem  of  finding  good  quality 
plans  is  different  from  that  of  reducing  the  effort  required  to  generate  plans.  If  only  limited  resources  (time 
or  space)  are  available  for  planning,  the  planner  may  have  to  give  up  trying  to  lind  a  good  solution  and 
resign  to  any  solution  within  the  given  resource  bound.  Section  2.6  discusses  metrics  of  problem  solving 
efficiency  and  some  previous  work  on  improving  it  in  PRODIGY. 

2.1  Minimizing  Execution  Cost 

The  quality  of  a  plan  is  strongly  related  to  the  cost  of  executing  it.  Some  of  the  factors  that  affect  a  plan's 
execution  cost  can  be  computed  by  summing  over  all  the  steps  or  operators  in  the  ;4.an,  that  is  ( 'u,tni  = 
where  C total  is  the  total  cost  of  executing  the  plan  and  r,  is  the  cost  for  each  operator,  c,  can  be  the  operator 
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execution  lime,  the  cost  of  the  resources  used  by  the  step,  or  1  if  the  measure  is  simply  the  length  of  the 
plan  or  total  number  of  actions.  This  section  analyzes  factors  that  inlluencc  a  plan’s  execution  cost. 

Execution  time:  a  straightforward  measure  of  execution  time  is  the  number  of  operations,  or  length, 
of  the  plan.  However  different  actions  frequently  take  different  times  to  execute  and  therefore  considenng 
the  execution  liiiie  for  each  operator  is  more  signiticanl  than  just  the  length  of  the  plan  The  phut  topology 
affects  execution  time  if  several  actions  may  be  executed  in  parallel. 

Material  resources:  a  resource  is  something  needed  to  pcrfomi  a  plan  step.  In  a  factory  scheduling 
di^iualii,  uT  diC  ^imcCss  plonriiiig  domain  dcscrifci-d  above,  rc.soim:c.-«  inrfudcpctsrffaicl,  .aaterial,  toots  at^! 
machines.  Each  action  includes  which  resources  it  requires  and  the  characteristics  that  the  resource  must 
have.  Resources  can  be  classified  in  several  categories; 

•  Time. 

•  Energy. 

•  Consumable  resources  are  those  for  which  an  initial  stockpile  is  available  which  can  only  be  depleted 
by  actions  in  the  plan  [Currie  and  Tate,  1991 1,  for  example  materials.  Particular  instances  of  these 
resources  cannot  be  reused  by  several  actions.  Consumable  resources  can  be  lurther  classified  as 
renewable  and  non-renewable.  The  agent  has  some  way  of  obtaining  more  renewable  resources  if  it 
runs  out  of  them  (for  example,  it  can  acquire  more  metal  stock).  However  once  it  runs  out  of  some 
non-renewable  resource  but  still  needs  it,  the  agent  cannot  make  progress  in  the  plan  execution. 

•  Non-consumable  resources  are  those  that  are  rendered  busy  for  a  period  of  lime,  and  then  released 
and  made  available  to  other  actions.  This  includes  the  use  of  machines  and  tools.  The  consumption 
of  resources  is  related  but  different  from  an  object  wearing  out.  For  example,  machines  wear  out,  but 
are  usually  considered  as  non-consumable  resources.  The  wearing  is  related  to  the  cost  of  an  operator 
making  use  of  the  resource,  but  not  directly  so. 

The  use  of  resources  is  closely  related  to  the  cost  of  executing  a  plan.  Good  plans  try  to  make  the 
best  use  of  the  resources  available.  There  are  many  considerations  about  resources  and  resource  u.se  that 
influence  the  quality  of  a  plan: 

•  Maximize  the  fraction  of  time  in  which  a  resource  is  actually  used,  and  inter-operation  transfer  times. 
The  planner  should  try  to  reduce  the  amount  of  time  a  resource  remains  idle  but  locked  by  some 
action.  An  example  of  this  is  reducing  the  time  an  aiqjlane  remains  sitting  in  the  airport  waiting 
for  its  load  to  arrive.  This  idle  time  may  force  the  pL,(ner  to  use  a  different  resource  for  another 
actions  in  the  meantime,  or  else  to  create  a  plan  with  a  longer  execution  time.  As  another  example, 
in  the  Hubble  space  tele.scope  (HST)  domain,  a  good  schedule  will  try  to  maximize  the  fraction  of 
time  spent  actually  recording  data  on  any  instrument  of  the  telescope,  as  opposed  to  (for  example) 
realigning  the  telescope  or  switching  instruments  [Muscetlola  and  Smith,  19901.  Scheduling  systems 
typically  address  these  considerations  in  order  to  generate  good  schedules  [Fox  and  Smith,  19841 

•  Minimize  the  number  of  resource  requests  that  will  be  necessary  during  plan  execution:  human 
experts  try  to  reduce  resource  reservations,  as  a  resource  request  may  create  waiting  times  until  the 
resource  becomes  available,  if  it  is  being  used  by  a  dil’ferent  agent.  Once  one  obtains  a  resource,  it 


^We  make  a  distinction  between  the  process  planning  problem  and  the  scheduling  problem.  The  former  can  be  defined  as 
selecting  a  sequence  of  operations  whose  execution  results  in  the  achievement  of  a  goal,  for  example  the  completion  of  an  order  in 
a  job  shop.  The  latter  focuses  on  the  assignment  of  start  and  end  times  and  resources  to  each  of  those  operations. 


is  better  to  use  it  as  much  as  possible  before  releasing  it.  In  the  process  planning  domain  example 
presented  in  Section  1.1,  the  better  solution  takes  advantage  of  the  same  set-up  to  perfomi  several 
operations,  instead  of  setting  up  the  machine,  tool  and  pan  once  for  each  operation  to  perform.  This 
not  only  reduces  the  length  of  the  plan  but  also  the  number  of  resource  requests. 

•  Resource  sharing  may  reduce  the  length  and  cost  a  plan,  for  example  when  the  same  truck  is  used 
to  move  two  diffcTcm  packages  to  the  same  destination.  Sometimes  however  it  vs  better  rtvA  tv^  use  a 
resource  that  is  being  used  by  other  operators.  This  is  a  way  to  avoid  resource  conflicts  with  the  plans 
I'ur  jtficr  agents  arid  then-lcic  the  nttulting  plans  arr  mjir  aincnabk  k'piiTttlleli/alifin  Wc  madi.*  (iijc 
of  this  criterium  in  previous  work  on  multiagent  planning  [Perez,  19911:  resource  preferences  were 
encoded  in  control  rules,  mostly  domain  independent,  and  then  plan  topology  and  resource  conflicts 
were  analyzed  in  order  to  build  parallel  plans. 

The  choice  of  resources  not  only  conditions  the  plan  execution  cost,  but  also  its  robustness.  Many  factors 
related  to  the  availability  of  alternative  resources,  such  as  machine  downtime,  machine  substitutability,  or 
alternative  production  processes,  can  affect  the  reliability  of  a  plan  and  the  availability  of  alternative  actions 
when  using  the  chosen  resource  fails. 

Agent  skill  requirements:  they  refer  to  the  extent  to  which  an  agent  can  perform  an  action.  Some 
examples  are  strength,  speed,  intellect,  and  how  good  it  is  at  a  particular  skill.  Plans  with  less  agent  skill 
requirements  are  typically  less  expensive. 

Plan  complexity:  plan  complexity  affects  both  the  execution  cost  and  the  robustness  of  a  plan,  and 
there  usually  exists  a  trade-off  between  these  two  measures  of  plan  quality.  The  complexity  of  a  phm  can 
be  seen  at  two  levels: 

•  operator  complexity;  the  choice  of  a  particular  operator  is  influenced  by  how  likely  it  is  to  fail  or 
have  unpredictable  results,  and  also  by  it;,  execution  cost,  both  in  time  and  resources. 

•  complexity  of  the  plan  structure:  plan  length  is  a  straightforward  measure  of  the  quality  of  a  plan. 
However  in  some  cases  it  is  interesting  to  consider  the  plan  topology  as  well.  The  dependencies 
among  operators  and  their  concurrency  on  the  u.se  of  resources  influence  execution  time,  as  several 
actions  may  be  performed  in  parallel  if  they  are  independent  [Perez,  1991].  They  also  influence  the 
plan  robustness  when  .several  actions  contend  for  the  same  resource. 

2.2  Maximizing  Plan  Robustness 

By  robusmess  of  a  plan  we  mean  its  ability  to  respond  well  under  changing  or  uncertain  conditions.  A 
plan’s  execution  may  fail  because  of  unexpected  environmental  changes  or  events,  of  actions  not  having 
their  intended  effects  (the  resources  are  not  reliable,  or  the  operators  outcome  is  uncertain  ^),  or  even  of 
inadequacies  of  the  planner  itself.  When  a  failure  occurs,  execution  of  a  plan  may  go  awry  and  produce 
outcomes  considerably  different  from  the  desired  goal.  Therefore  the  quality  of  a  plan  depends  on  its 
reliability  and  its  potential  for  recovery  after  a  failure.  Recent  work  with  focus  on  execution-failure 
recovery  uses  different  methods  '"or  adapting  a  plan  upon  failure  so  execution  can  continue  [Howe  and 
Cohen,  1991).  Another  approach  is  to  improve  the  planner’s  knowledge  to  guide  its  .search  towards  more 
robust  plans.  Note  that  there  is  usually  a  trade-off  between  cost  and  robustness  [Feldman  and  Sproull, 
19771. 

Three  aspects  determine  a  plan’s  robustne.ss: 


’However  PRODIGY  in  particular  assumes  that  there  is  not  uncertainty  on  the  operators  or  the  world  model.  For  a  thorough 
investigation  of  inadequate  operators  and  learning  to  improve  their  fidelity  to  external  actions,  see  IGil,  1992). 


•  probability  of  failure  ^ 

I 

•  extent  of  the  failure  ^  J  j 

•  possibility  of  failure  recovery  • 


To  reduce  the  probability  of  an  execution  failure  the  planner  may  reuse  plans  that  proved  successful 
in  past  similar  situations  and  were  stomd  then.  Plans  that  are  complex,  both  at  the  operator  level  and  at 
the  plan  topology,  are  more  prone  to  c  tion  failures,  as  discussed  above.  Some  operators  may  be  more 

likely  to  fail  or  act  unpredictably  tha  rs.  Complex  plans  with  many  dependencies  among  operators 

and  resource  sharing  may  suffer  of  rescu.  -  contention  and  therefore  fail  at  execution  time. 

Another  factor  for  plan  robustness  is  whether  the  consequences  of  a  failure  during  execution  arc 
localized.  If  all  the  plan  actions  are  interacting  and  one  of  them  fails,  the  plan  may  fail  without  making 
any  progress  towards  the  goal.  However  if  two  parts  of  a  plan  are  non-interacting,  a  failure  in  one  part 
will  not  affect  the  other.  (We  lefer  to  these  types  of  failures  as  containable  failures.)  In  particular  linearly 
decomposable  plans  are  preferred,  i.e.  those  that  car.  be  decomposed  in  independent  subplans  that  achieve 
different  subgoals.  Therefore  the  localization  of  the  parts  of  the  plan  that  can  cause  execution  failures 
increases  the  possibility  of  partial  success  of  the  plan.  By  analyzing  those  parts  the  planner  can  incorporate 
redundant  steps  that  increase  the  probability  of  success  of  the  whole  plan. 

Failures  are  often  unpredictable  and  in  spite  of  reducing  their  chances  of  occurring,  they  may  eventually 
happen.  Plan  robustness  includes  the  possibility  of  recovery  after  a  failure.  This  pos.sibility  increases  if  the 
use  of  non-renewable  resources  in  the  plan  is  minimal.  If  the  agent  runs  out  of  a  resource  of  this  type,  no 
recovery  that  requires  that  same  resource  is  possible.  Recovery  is  facilitated  when  other  alternatives  to  the 
faulty  parts  of  the  plan  are  available.  These  alternatives  may  be  built  in  the  plan  by  planning  in  advance 
for  contingencies.  The  alternatives  may  also  be  .stored  in  a  library  of  recovery  methods  [Howe  and  Cohen, 
19911.  They  can  also  be  left  unexplored.  In  this  case  they  can  be  used  upon  failure  at  running  time  for 
replanning  and  recovery. 

2.3  Client  Satisfaction 

There  are  some  other  factors  of  plan  quality  that  can  hardly  be  considered  in  the  previous  categories  and  in 
some  cases  they  are  hard  to  quantify.  For  example,  the  human  user  may  preler  a  plan  that  takes  him  Irom 
one  city  to  another  in  first  class  instead  of  second  class.  The  process  planning  domain  exemplifies  how 
the  degree  of  accuracy  required  in  a  part  may  influence  the  choice  of  machine  or  tool  to  perlomi  a  given 
operation,  if  the  result  has  to  satisfy  a  fine-grain  tolerance. 

In  scheduling  systems  that  try  to  find  the  optimal  schedule,  the  value  of  the  resulting  state  is  an  applicable 
criterium  to  measure  plan  quality.  The  quality  of  a  solution  can  be  measured  by  the  number  of  the  goals 
achieved  by  the  solution  or  the  value  of  .such  goals.  For  example  in  the  ca.se  of  scheduling  the  operations  of 
the  Hubble  Space  Telescope,  the  quality  of  the  solution  obtained  depends  on  the  number  of  proposals  that 
the  HST  can  accommodate  [Muscettola  and  Smith,  19901  and  how  the  proposals  accommodated  relate  lo 
the  program  and  observation  priorities,  which  are  given  as  part  of  the  problem  to  solve.  ^ 

2.4  Trade-Offs  Among  Quality  Metrics 

When  deciding  which  plan  is  better  one  can  easily  run  into  trade-offs.  This  can  be  illustrated  with  an 
example  from  the  process  planning  domain.  In  this  domain  several  machines  can  be  used  to  reduce  the 

“Note  however  that  at  present  PRODIGY'S  solution  to  a  problem  achieves  all  the  goals  in  the  problem,  or  else  renders  the  problem 
unsolvable.  Therefore  in  that  sense  all  the  plans  have  the  same  value,  with  respeet  to  the  degree  of  goal  satisfaction. 
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si/C  of  a  part.  The  choice  of  a  particular  type  of  machine  may  depend  on  the  degree  of  accuracy  desired, 
the  economy  of  the  plan,  or  the  time  required  for  execution.  The  last  two  factors  in  turn  might  only  be 
relevant  depending  on  the  number  of  parts  on  which  the  same  operation  has  to  be  performed  (i.c.  how  many 
times  the  same  plan  will  be  executed).  Grinding  a  part  gives  better  nnishes  and  holds  closer  tolerances  than 
other  machines,  but  it  may  be  more  expensive.  Tlie  shaping  and  planing  operations,  in  order  to  reduce  a 
part’s  size,  are  slower  than  milling  the  part,  but  the  tools  they  use  are  less  expensive  and  easier  to  sharpen. 
Therefore  it  may  be  necessary  to  consider  trade-offs  among  the  different  options.  Factors  such  as  the  skills 
available  and  required  to  operate  the  machine,  personal  likes  and  dislikes,  and  availability  may  al.so  need  to 
be  considered.  Most  of  these  factors  are  not  as  basic  as  others  but  in  some  cases  may  be  decisive. 

Scheduling  systems  have  to  face  a  similar  problem  when  the  constraints  encoding  different  quality 
factors  conflict  [Fox  and  Smith,  1984).  For  example,  removing  a  machine’s  second  shift  may  decrease 
costs  but  may  also  cause  an  order  to  miss  its  due  date.  Therefore  in  these  systems  one  cannot  rely  only 
on  constraint  propagation  techniques  to  arrive  to  acceptable  solutions.  Rather,  they  choose  to  relax  some 
constraints  and  And  a  solution  that  best  satisfles  the  remaining  constraints. 

2.5  Domain-Dependent  Versus  Domain-Independent  Metrics 

In  some  cases  the  ways  to  measure  the  quality  of  a  plan  are  clearly  domain  dependent.  The  goal  of  the 
scheduler  presented  in  [Perry,  1990!  is  to  schedule  launches  and  tenninal  illuminators  to  maximize  the 
depth  of  fire,  or  number  of  shots  at  incoming  threats.  Other  criteria,  such  as  “minimize  the  consumption  of 
resources,”  seem  obvious  and  applicable  to  every  domain.  However  we  can  find  ca.scs  where  this  is  not  the 
best  thing  to  do.  The  distinction  between  domain-dependent  and  domain-independent  aspects  is  not  always 
clear.  We  take  this  example  from  [Wilkins,  1988];  consider  the  advice  “use  existing  objects.”  This  is  a 
fairly  domain-independent  concept  that  is  used  by  NOAH,  and  mentioned  by  Wilensky  as  a  meta-goal  for 
planning.  However  this  idea  still  involves  domain-dqjendcni  knowledge.  In  a  house  building  dtunain.  it  is 
desirable  to  use  the  same  piece  of  lumber  to  support  the  roof  and  the  sheetrock  on  the  walls.  But  in  other 
doouiiK  ihii  duty  ntjl  bo  u  guid  "tiKUCgy  On  ihe  space  shuuJli:  unr  tiaiiy  want  dinctcni  fum-nuns  perliamu’d 
by  different  objects  so  the  plan  will  be  more  robust  and  less  vulnerable  to  the  failure  of  any  one  object.  So 
the  “use  of  existing  objects”  idea  makes  assumptions  about  the  domain  that  need  to  be  slated  (perhaps  one 
wants  to  apply  this  idea  only  to  certain  portions  of  the  domain). 

2.6  Measures  of  Planning  Cost 

The  quality  of  a  planning  algorithm  depends  both  on  the  quality  of  the  solutions  generated  and  on  the  effort 
speirt  seafctiiiig  (or  them.  11  only  !kni«d  cotnpttlaltunjtl  resoufues  are  av  aiilaWc  totlie  plataier  dm  dig  pnjbleiu 
solving,  the  planner  may  have  to  trade  off  solution  quality  in  order  to  find  a  solution  at  all.  The  planning  cost 
depends  on  both  the  time  and  the  space  spent  during  problem  solving.  Two  measures  of  planning  cost  have 
been  used  in  the  literature  on  learning  and  planning  (for  example  [Minton,  1988,  Perez  and  Etzioni,  19921), 
namely  search  time  and  number  of  nodes  in  the  search  tree.  Figure  5  summarizes  planning  cost  criteria. 
Planning  time  can  be  measured  as  time  spent  in  pure  planning,  or  amortized  when  planning  and  Icanting 
are  interleaved.  Planning  space  is  usually  measured  as  working  space  (for  example,  number  of  nodes 
rxpandeJ  in  srsirhl  When  plomrinf  svl  Icartfing  ht  intcrirawed  ind  thf  plannct  kJKrwlcdge  that 
will  be  useful  later,  a  long-term  use  of  space  has  to  be  considered.  The  stored  knowledge  can  take  the  form 
of  search  control  rules  extracted  from  problem  solving  traces,  or  of  cases  in  a  case  library  jVcIoso,  1992, 
Kambhampaii,  1990,  Hammond,  19861.  Recycling  past  successful  experience  reduces  the  .search  elTort 
when  solving  new  similar  problems.  Note  that  there  is  usually  a  trade-off  among  the  amount  of  knowledge 
stored,  the  cost  of  accessing  and  reusing  k,  and  the  .savings  on  scareh  gained  from  k  iMkiKm,  19881. 
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Figure  5:  A  Taxonomy  of  Metrics  of  Planning  Cost.  Most  of  the  machine  learning  mechanisms  designed  to 
dale  to  acquire  control  knowledge  for  planning  systems  are  aimed  at  improving  the  efficiency  of  planning. 
These  techniques  are  known  as  speed-up  learning. 


2.6.1  Solution  Quality  Versus  Problem  Solving  Efficiency 

As  we  have  already  mentioned,  the  problem  of  finding  good  quality  plans  is  different  from  that  of  reducing 
the  effort  required  to  generate  plans.  In  many  domains  finding  a  plan  at  all  requires  a  considerable  amount 
of  search  and  there  has  been  work  on  improving  the  efficiency  of  a  problem  solver  with  machine  learning 
techniques  [Mitchell  et  ai,  1983,  Laird  etal.,  1986,  Gralch  and  DeJong,  1992,  Korf.  1985,  Veloso,  1992, 
Minion,  1988,  Knoblock,  1991b,  Etzioni,  19901  We  call  this  ipeed-up  learning.  However  these  mechanisms 
have  paid  none  or  little  attention  to  the  quality  of  the  solutions  obtained.  Here  we  brielly  present  some 
examples  of  speed-up  learning  systems  in  the  context  of  prodigy. 

prodigy’s  explanation-based  learning  system  (EBL)  [Minion,  1988]  constructs  explanations  from  a 
problem  solving  trace  and  an  axiomalized  theory  describing  both  the  domain  and  relevant  aspects  of  the 
problem  solver’s  architecture.  Then  the  resulting  descriptions  are  expres.sed  in  control  rule  form,  and  control 
rules  whose  utility  in  search  reduction  outweighs  their  application  overhead  are  retained.  The  system  can 
learn  from  success,  failure  and  goal  interaction.  These  concepts  are  represented  declarativcly  as  target 
concepts.  The  analysis  of  goal  interactions  may  lead  to  belter  plans  (see  Section  3)  but  that  is  not  a  goal 
of  the  EFiL  module.  For  the  EBL  module  to  Icam  control  knowledge  that  improves  the  qutiJity  of  the 
plans,  it  would  be  necessary  to  augment  the  domain  theory  and  target  concepts  to  be  able  to  explain,  or 
prove,  why  the  solution  obtained  in  the  current  problem  solving  episode  is  a  good  one.  Something  similar 
may  be  said  of  systems  that  perform  static  analysis  on  the  problem  space  representation  [Etzioni,  1990, 
Perez  and  Etzioni,  19921. 

The  derivational  analogy  module  of  PRODIGY  [Veloso,  19921  stores  past  problem-solving  experiences  as 
cases,  and  reuses  them  to  solve  similar  problems,  obtaining  a  considerable  improvement  in  problem  solving 
efficiency.  The  use  of  one  or  more  past  cases  to  solve  the  current  problem  may  lead  to  shorter  plans,  as 
reported  in  [Veloso,  19921.  This  was  a  surprising  result  but  not  the  focus  of  the  work.  Note  in  lacl  that  if 
the  solution  stored  to  solve  a  proL’em  was  not  a  good  one,  it  ma>  be  reused  to  solve  subsequent  problems 
without  trying  to  find  a  better  .solution. 

prodigy’s  abstraction  planning  module  [Knoblock,  1991al  divides  the  axiomalized  domain  knowledge 
into  multiple  abstraction  levels.  Then  during  problem  solving,  a  solution  is  found  first  in  the  top-level  space 
to  guide  the  search  for  solutions  in  more  detailed  problem  spaces.  The  u.se  of  abstraction  hierarchies  reduces 
the  problem  .solving  space  but  docs  not  guarantee  that  the  solution  obtained  is  the  best  one  (see  [Carbonell 
et  ai,  19921  for  an  example).  However  it  may  lead  to  produce  shorter  solutions  since  the  abstractions  focus 
the  problem  solver  on  the  parts  of  the  problem  that  .should  be  solved  first.  [Knoblock,  199 Ibl  presents 
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experiments  in  which  the  use  of  abstractions  pmduces  solutions  that  arc  about  10%  shorter  than  those 
produced  by  PRODIGY  in  certain  domains.  Note  that  the  measure  of  plan  quality  used  in  this  ease  is  the 
length  of  the  plan,  and  it  seems  that  these  results  do  not  extend  to  other  quality  metrics. 

In  the  work  reported  here  we  will  focus  on  the  part  of  the  taxonomy  related  to  exCcutioft  i.:>st.  See 
Section  6  for  a  description  of  the  proposed  work  plan. 

3  Solution  Quality  and  Goal  Interactions  ^ 

Planning  goals  rarely  occur  in  isolation.  A  planner  must  be  capable  of  taking  into  account  the  interactions 
between  conjunctive  goals  in  order  to  produce  a  plan  to  solve  the  problem.  There  have  been  many  research 
efforts  addressing  the  issue  of  planning  for  conjunctive  goals,  focusing  on  a  variety  of  aspects,  including 
ft/ralyzing  the  complexity  of  this  planning  problem  [Sussman,  1975,  Giapman,  1987],  designing  appropriate 
planning  algorithms  [Sacerdoti,  1977, Tate,  1977,  Drummond  and  Currie,  1987],  catcgorizingdilTercnt  types 
of  goal  interactions  [Wilensky,  1983],  and  learn;  .  control  kitowlcdgc  tocff-icntly  handle  the  .search  for  the 
interactions  [Minton,  1988,  Etzioni,  1990].  Oc  ork.  thourh  b'dlt  upon  prsv-ou*  wi>rk,  goes  beyond 
it  as  we  aim  at  identifying  goal  interactions  directly  related  to  the  quality  of  the  plans  produced. 

From  a  practical  implementation  point  of  view  we  distinguish  two  categories  of  goal  interactions,  explicit 
goal  interactions  and  quality  goal  interactions. 

3.1  Explicit  goal  interactions 

We  include  in  this  category  the  goal  interactions  that  arc  explicitly  represented  as  part  of  the  domain 
knowledge  in  terms  of  preconditions  and  effects  of  the  operators.  A  plari  exhibits  a  goal  interaction  of  this 
type  when  there  is  a  goal  in  the  plan  that  has  Iseen  negated  by  a  previous  step  in  the  plan  [Minton,  1988, 
Etzioni,  1990].  These  goal  interactions  enforce  particular  goal  orderings  in  order  that  the  planner  may  be 
able  to  produce  a  solution  to  the  problem.  In  a  typical  example  of  a  two-goal  interaction,  after  one  of  ihe 
goals  has  been  achieved,  it  is  deleted  by  an  operator  that  works  towards  achieving  the  other  goal.  ^ 

Goal  interactions  in  this  category  include  the  well-known  Sussman ’s  anomaly  in  the  blocksworld  I  Suss- 
man,  1975].  Consider  also  another  illustrative  example  in  a  transportation  domain.  In  this  domain,  packages 
are  to  be  moved  among  different  cities.  Packages  arc  carried  within  the  same  city  in  trucks  and  across  cities 
in  airplanes.  Trucks  and  airplanes  may  have  limited  capacity.  At  each  city  there  arc  .several  locations,  c.g. 
post  offices  (po)  and  airports  Cap).  A  package  PI  is  at  the  Pittsburgh  airport.  There  is  only  one  airplane, 
Al,  available  also  at  the  Pittsburgh  airport.  The  goal  consists  of  having  both  the  airplane  and  the  package 
at  the  Boston  airport,  and  is  represented  by  the  conjunction  (and  (at-airplane  Al  bos-airpert) 
(at-object  PI  bos-airport).  If  the  goal  (at-airplane  Al  bos-airport)  is  addressed  lira,  and 
Al  flies  from  Pittsburgh  to  Boston,  there  is  no  way  to  achieve  the  sceend  goal  without  first  flying  bark  Al  to 
Pittsburgh.  The  resulting  plan  involves  flying  Al  back  and  forth  unnecessarily.  It  is  conceivable  to  design 
an  algorithm  that  fixes  this  kind  of  plans  by  removing  unnecessary  opcra4it?ns  that  reachieve  the  clobbered 
goals  [Rich  and  Knight,  1991  ]. 

In  some  problems,  these  interactions  are  unavoidable  and  the  planning  system  must  find  a  solution  that 
minimizes  their  effects.  When  this  happens,  search  time  is  typically  reduced  and  better  solutions  tend  to  be 

’This  section  appears,  extended,  in  (P6rez  and  Veloso,  19931 

’Other  goal  interactions  in  this  category  may  be  benehcial  to  the  planning  process,  when  solving  one  goal  makes  a  second  goal 
easier  to  achieve.  This  is  generally  termed  goal  concord  and  opportunistic  planning  takes  advantage  of  these  situations  [Converse 
and  Hammond,  19921. 
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found.  These  solutions  arc  generally  shorter  in  length,  and  more  direct  [Minton,  1988,  Ryu  and  Irani,  1992, 

Veloso,  1992]. 

(n  least-commitment  planners  the  critics  take  care  of  these  goal  interactions  by  establishing  ordering 
constraints  Jimong  the  conflicting  goals.  In  the  case  of  PRODIGY,  a  casual-commitment  planner,  goal 
preference  contre’  knowledge  is  automatically  acquired  to  deal  effectively  (in  the  .sense  of  problem  solving 
effort)  with  this  kind  of  goal  interactions  by  different  machine  learning  approaches,  namely  explanation-  • 

based  learning  [Minton,  1988[,  static  analysis  [Etzioni,  19901,  or  derivational  analogy  I  Veloso,  1992[. 

A  particular  problem  may  have  many  different  solutions.  These  sctlutions  may  differ  in  the  set  ol 
operators  in  the  plan.  If  the  ordering  constraints  between  achieving  two  goals  arc  explicit  in  the  domain 
representation,  then  all  the  solutions  to  a  particular  problem  will  have  the  two  goals  interacting.  On  the 
other  hand  ‘he  dependencies  may  be  the  result  of  a  particular  problem  solving  path  explored.  In  this  ease  f 

for  some  solutions  the  goals  may  inter:. -t  and  for  some  others  they  may  not. 

3.2  Quality  goal  interactions 

To  illustrate  this  difference  and  to  motivate  the  quality  goal  interactions,  we  further  discuss  different  plans 

with  ordering  constraints  that  arc  or  arc  not  explicit  in  the  domain.  In  the  one-way  rocket  domain  [Veloso,  ® 

1989  [,  the  goals  of  moving  two  objects  to  a  different  location  interact,  because  the  rocket  can  only  move  once. 

This  is  an  interaction  that  is  represented  in  the  domain  definition  as  the  moving  operator  explicitly  deletes 
the  location  of  the  rocket.  The  machine-shop  scheduling  domain  [  Minton  et  al.,  1989[  al.so  constraints  that 
holes  in  parts  must  be  dnlled  before  parts  arc  polished,  as  the  drilling  operator  deletes  the  shining  elVeet.  In 
this  domain,  the  goals  of  polishing  and  making  a  hole  in  a  part  interact  again  due  to  the  domain  definition.  I 

However,  in  this  same  machine-shop  scheduling  domain,  when  two  identical  machines  arc  available  to 
achieve  two  identical  goals,  these  goals  may  interact,  if  the  problem  solver  chooses  to  use  just  one  machine 
h>  both  guak,  m  k  will  ►nivt  to  wait  ftif  the  ii-ija-httie  w  Iw  U1U‘  If  ihc-  pnihlem  siiIvlt  u'i-'j  »Iv  two 

machines  instead  of  just  one,  then  the  gotils  do  not  interact  in  this  particular  solution. 

There  is  a  variety  of  equivalent  cxaniplcs  in  the  logistics  transportation  domain.  In  general  it  is  not  I 

clear  what  use  of  resources  Is  overall  the  best.  Fvtt  cxaritple,  in  the  logistics  tramponatit^nvktntain,  suppose 
that  the  problem  solver  assumes  that  the  same  truck  (or  airplane)  must  be  used  when  moving  objects  from 
the  sitnttJ  lccalk.>n  into  the  wme  lof  clvuej  dwtiny  in  ihis  riw  iht  of  nu'ving  ihc  obicd.i  inier[u:t. 

But  if  different  carriers  arc  chosen,  there  is  not  such  interaction.  Note  that  the  problem  can  become  quite 
complicated  as  the  domain  considers  other  types  of  constraints,  such  as  capacity  for  the  carriers,  size  of  the 
objects  to  be  transported,  distances  between  locations  that  dictate  the  type  ol  carrier  to  u.sc,  and  .so  on. 

In  the  example  presented  in  Section  1.1  the  planner  obtains  the  better  solution  by  interleaving  the 
pTuWeni  g\rals.  In  M<uL>tc  r  ihi.s  dccismn  may  be  encoded  in  the  Rmn  of  scarefi  rulcsT  Note  that  il 

the  goal  interactions  are  not  considered  the  planner  still  constructs  a  valid  solution.  It  is  only  because  of 
quttltiy  considerations  that  the  interactions  ixxur  os  the  same  set-up  i.s  u.scd  for  arhioving  ihe  goals  for  boih 
holes.  These  interactions  are  not  explicitly  represented  in  the  domain  specification.  I 

Tliese  interactions  arc  related  to  plan  quality  as  the  u.sc  of  resources  dictates  the  interaction  between 
ihv.  goals.  Tb«.  (.laitrol  knowledge  that  guides  the  plaWiCr  in  st)Kc  these  t>ucfihsk>ns  «  harder  ui  leant 
automatically,  as  the  domain  theory  docs  not  encode  these  quality  criteria.  Our  work  is  a  current  research 
effort  on  learning  control  knowledge  to  improve  the  quality  of  the  plans  generated  by  the  problem  solver. 

Some  of  the  interactions  between  goals  are  due  to  the  use  of  a  state-space  planner,  as  the  operator  I 

ordering  in  the  linal  plan  is  tied  to  the  goal  ordering  during  problem  solving.  By  using  a  plan-space  planner, 
in  which  actions  can  be  inserted  anywhere  in  the  plan,  some  of  these  purblcms  may  go  away.  Howcvci 
there  is  still  the  issue  of  which  is  the  appropriate  control  knowledge,  heuri.stics  or  critics,  to  select  the  best 
place  to  insun  actums  mto  the  plati.  Tficre  arc  a  few  othw  piannws  that  amfyK  the  relatkjnship'  beiwcim 
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goal  interactions  and  plan  quality.  Section  7.1  discusses  some  ol  this  related  work. 

4  Background:  The  prodigy  Problem  Solver 

PRODIGY  is  a  domain-independent  problem  solver.  Given  an  initial  state  and  a  goal  expression,  PRODIGY 
searches  for  a  sequence  of  operators  that  will  transform  the  initial  state  into  a  state  that  matches  the  goal 
expression,  prodigy’s  sole  problem-solving  method  is  a  form  of  means-ends  analysis.  Table  1  dc.scribcs 
the  basic  search  cycle  of  prodigy’s  nonlinear  planner  [Veloso,  19891.  A  complete  desenption  of  prodig  ' 
appears  in  [Carbonell  etal.,  1992). 

1 .  Check  if  the  goal  statement  is  true  in  the  current  state,  or  there  is  a  reason  to  suspend  the  current  search  path. 

If  yes,  then  either  return  the  hnal  plan  or  backtrack. 

2.  Compute  the  set  of  pending  goals  and  the  set  of  possible  applicable  operators  A. 

3.  Choose  a  goal  (I  from  t?  or  select  an  operator  A  from  ^  that  is  directly  applicable. 

4.  If  G  has  been  chosen,  then 

•  get  the  set  O  of  relevant  operators  for  the  goal, 

•  choose  an  operator  O  from  O, 

•  get  the  set  B  of  possible  bindings  for  O, 

•  choose  a  set  B  of  bindings  from  B, 

•  go  to  step  1. 

5.  If  an  operator  A  has  been  selected  as  directly  applicable,  then 

•  apply  A, 

•  go  to  step  1. 

Table  1:  A  Skeleton  of  prodigy’s  Nonlinear  Search  Algorithm  (Adapted  from  1  Veloso,  19891).  Problem 
solving  decisions,  namely  selecting  which  goal/subgoal  to  address  next,  which  operator  to  apply,  what 
bindings  to  select  for  the  operator,  or  where  to  backtrack  in  case  of  failure,  can  be  guided  by  control 
knowledge,  prodigy’s  trace  provides  all  the  information  about  the  decisions  made  during  problem  .solving 
so  it  can  be  exploited  by  machine  learning  methods. 

PRODIGY  provides  a  rich  action  representation  language  coupled  with  an  expressive  control  language. 
Preconditions  in  the  operators  can  contain  conjunctions,  disjunctions,  negations,  and  both  existential  and 
universal  quantifiers  with  tyoed  variables.  Effects  in  the  operators  can  contain  conditional  effects,  which 
depend  on  the  state  in  which  the  operator  is  applied.  The  control  language  allows  the  problem  solver  to 
represent  and  learn  control  information  about  the  various  problem  solving  decisions,  such  as  selecting  which 
goal/subgoal  to  address  next,  which  operator  to  apply,  what  bindings  to  select  for  the  operator  or  where  to 
backtrack  in  case  of  failure.  In  PRODIGY,  there  is  a  clear  division  between  the  declarative  domain  knowledge 
(operators  d  inference  rules)  and  the  more  procedural  control  knowledge.  This  simplifies  both  the  initial 
specification  of  a  domain  and  the  incremental  learning  of  the  control  knowledge. 

PRODIGY  is  designed  with  a  “glass-box”  approach:  all  the  steps  taken,  all  the  decisions  made,  and  all 
the  information  consulted  by  the  engine  are  available  in  a  problem’s  trace.  This  provides  an  information 
context  in  which  learning  can  take  place.  Previous  work  on  PRODIGY  used  explanation-based  learning 
techniques,  static  analysis  of  the  domain  definition  [Minton,  1988,  Etzioni,  1990,  Pdrez  and  Etzioni,  19921, 
and  derivational  analogy  [Veloso,  1992]  to  acquire  search  control  knowledge  to  increase  problem-solving 
efficiency.  The  machine  learning  and  knowledge  acquisition  work  supports  prodigy’s  casual-commitment 
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method  as  it  assumes  there  is  inteUif’ent  control  knowledge,  exterior  to  its  search  cycle,  that  it  can  rely 
upon  to  make  decisions. 

4.1  Example  Domains 

The  examples  presented  throughout  this  document  are  extracted  from  two  domains:  a  transportation  logistics 
domain,  and  a  machining  process  planning  domain.  These  are  the  most  complex  and  real-world  domains  to 
which  PRODIGY  has  been  applied  up  to  date.  In  this  .section  we  describe  them  briefly. 

4.1.1  The  IVansportation  Logistics  Domain 

This  is  a  complex  logistics  planning  dorfiwin  in  which  packages  are  to  be  moved  among  different  cities. 
Packages  are  carried  within  the  same  city  in  trucks  and  across  cities  on  airplanes.  Each  truck  operates  in  a 
single  city.  At  each  city  there  are  several  locations,  e.g.  post  offices  and  airports.  There  are  six  operators  in 
the  domain  namely  loading  and  unloading  trucks  and  airplanes,  driving  tmeks  between  locations,  and  flying 
airplanes  between  airports.  In  the  current  version  trucks  and  airplanes  have  unlimited  capacity.  The  domain 
could  be  extended  in  different  ways,  for  example  to  consider  the  capacity  of  the  carriers  and  package  sizes, 
the  fuel  consumption,  and/or  the  distance  between  cities. 

In  this  domain  interleaving  of  goals  and  .subgoals  at  different  levels  of  the  search  is  needed  to  find  a 
good  solution:  consider  for  example  the  problem  of  moving  two  given  packages  from  the  Pitt.sburgh  airport 
to  the  Boston  airport.  Accomplishing  cither  goal  individually,  as  a  linear  planner  would  do,  would  require 
using  a  different  airplane  (or  a  different  trip  of  the  same  airplane)  for  each  of  the  packages,  which  is  clearly 
an  inefficient  way  to  solve  the  problem,  prodigy’s  nonlinear  algorithm  may  delay  flying  the  airplane  until 
both  packages  are  loaded  by  means  of  control  rules. 

4.1.2  The  Machining  Process  Planning  Domain 

Section  1.1  described  this  domain  and  introduced  an  example  taken  from  it.  This  domain  is  to  date  the 
largest  one  implemented  in  prodigy.  It  has  75  operators  and  35  .nfcrcncc  mlcs.  IGil,  1991 1  describes  it 
in  detail.  Some  of  the  problems  used  to  test  the  domain  were  taken  from  real  engineers  specifications  as 
presented  in  [Hayes,  19901  and  the  number  of  operators  (not  including  inference  mlcs)  in  their  .solution 
ranges  between  35  and  70. 

4.2  PRODIGY  Decisions  that  Affect  Plan  Quality 

The  previous  section  presented  the  types  of  decision  points  in  PRODIGY’S  search  cycle.  These  decisions  may 
influence  the  quality  of  the  final  plan  and  encode  available  expert  knowledge.  Some  examples  follow: 

•  Goal  ordering:  in  the  example  in  Section  1 . 1  choosing  the  right  goal  ordering  reduced  plan  length.  In 
general,  asymmetric  goal  interactions  yield  a  preferred  optimal  goal  ordering  to  minimize  clobbering 
of  previously  achieved  goals.  Here  we  present  another  example  of  how  goal  ordering  decisions 
influence  plan  quality.  Suppose  the  problem  posed  to  PRODIGY  is  to  have  a  part  with  two  holes, 
one  opening  into  another.  This  can  be  encoded  as  the  conjunction  of  two  goals,  one  for  each  hole. 


^In  a  casual-commitnient  sualegy  at  each  decision  poinl  Ihe  planner  commits  lo  a  particular  alternative,  and  backtracks  upon 
failure.  This  is  in  contrast  to  a  least-commitment  strategy  where  decisions  arc  deferred  until  all  possible  interactions  are  recognized. 

"Throughout  this  document  several  examples  are  presented.  In  them  we  follow  prodigy’s  standard  notation;  instantiated 
operators  are  enclosed  in  <>  and  literals,  both  stale  literals  or  goals,  are  enclosed  in  ( ) . 


PRODIGY  has  to  start  deciding  on  which  hole  to  work  first,  i.e.  which  of  the  two  goals  try  to  achieve 
first.  The  following  advice  may  be  used  to  guide  prodigy’s  decision: 

If  a  hole  H\  opens  into  another  hole  //a,  then  one  is  recommended  machining  Hz  before 
H\  in  order  to  avoid  the  risk  of  damaging  the  drill.  [Descotte  and  Latombe,  1985] 

This  advice  can  be  translated  into  a  search  control  rule.  The  antecedent  preconditions  match  when 
one  of  the  holes  actually  opens  into  another.  The  consequent  recommends  with  which  hole  to  start 
planning.  It  is  interesting  to  realize  that  the  expert  advice  may  apply  only  in  some  circumstances.  If 
machining  the  holes  in  the  opposite  order  is  faster,  the  right  decision  could  have  been  different,  and 
the  rule  should  only  fire  when  the  risk  of  damaging  the  drill  is  more  important  than  the  time  spent  on 
the  operations.  Therefore  different  control  rules,  or  control  rule  sets,  may  encode  different  strategies. 

In  some  cases  choosing  the  appropriate  goal  ordering  is  required  to  deal  with  goal  interactions, 
both  positive  and  negative.  Section  3  presented  some  examples  in  which  choosing  the  right  goal 
interleaving  produces  better  plans. 

•  Operator  preferences;  suppose  PRODIGY’S  goal  is  to  reduce  the  size  of  a  part.  Some  of  ihe  candidate 
operators  to  achieve  this  goal  are  SHAPE,  SHAPE-WITH-PLANER,  and  MILL.  The  following expeit 
advice  may  be  useful  lo  decide  which  operator  to  try  first,  and  can  be  translated  into  one  or  more 
control  rules  whose  consequents  propose  the  appropriate  operator. 

In  most  shaping  and  planning  operations,  cutting  is  done  in  one  direction  only.  The  return 
stroke  represents  lost  time.  Thus  these  processes  are  slower  than  milling  and  broaching, 
which  cut  continuously.  On  the  other  hand,  shaping  and  planning  use  single-point  tools 
that  are  less  expensive,  are  easier  to  sharpen,  and  are  conducive  to  quicker  set-ups  than 
the  multiple-point  tools  of  milling  and  broaching.  This  makes  shaping  or  planning  often 
economical  to  machine  one  or  a  few  pieces  of  a  kind.  ([Doyle,  1969],  p.  597). 

•  Binding  preferences:  suppose  PRODIGY  is  asked  now  to  .solve  a  problem  in  the  logistics  domain.  The 
goal  is  to  move  Package3  from  the  airport  to  the  post  office,  and  to  do  this  two  trucks.  Truck  1  and 
Truck2,  are  available  at  the  airport.  Truckl  has  a  bigger  capacity,  and  therefore  is  more  expen.sive 
to  use,  than  Truck2.  However  Truck2’s  driver  is  known  to  be  less  reliable  lhat  Truck I’s  driver.  To 
achieve  the  goal,  PRODIGY  picks  operator  UNLOAD-TRUCK.  Then  the  next  decision,  a  bindings 
choice,  is  which  truck  to  use.  If  the  strategy  is  to  keep  the  cost  low,  Truck2  should  be  preferred,  but  if 
reliability  of  the  plan  is  the  major  factor  Truck  1  should  be  uwd  in  spite  of  its  greater  cost.  Note  lhat 
the  choice  of  bindings  may  influence  the  operators  that  follow.  Section  5.2.1  presenis  this  example  in 
detail. 


5  Work  to  Date 

In  order  to  establish  the  feasibility  of  the  research  proposed  in  this  document,  initial  investigations  were 
conducted  on  acquiring  control  rules  to  enhance  plan  quality.  In  this  section  we  describe  the  prototype 
implemented  to  date.  The  viability  of  the  approach  and  the  use  of  the  prototype  are  illustrated  with  two 
examples  from  the  logistics  transportation  domain. 

5.1  Semi-Automated  (Interactive)  Acquisition  of  Quality- Enhancement  Control  Rules 

As  mentioned  in  Section  2.6. 1 ,  all  the  methods  to  acquire  control  knowledge  for  PRODIGY  focus  on  improving 
the  planner’s  efficiency  by  reducing  the  search  space.  The  work  proposed  here  intends  to  acquire  control 
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knowledge  that  guides  search  in  order  to  improve  the  qutility  of  the  solutions  obtained  by  the  planner.  The 
metrics  of  solution  quality  on  which  we  locus  arc  the  plan  length  and  the  cost  ol  executing  the  plan.  So  lar 
we  have  concentrated  on  the  acquisition  of  search  control  nilcs.  These  rules  provide  guidance  during  search 
to  make  local  decisions.  However  it  is  not  clear  yet  whether  these  local  decisions  will  be  enough  to  lead 
the  problem  solver  towards  better  solutions,  or  prodigy’s  control  structure  will  have  to  be  extended.  The 
control  rules  encode  the  knowledge  extracted  from  a  domain  expert:  knowledge  about  why  a  solution  is 
better  than  other,  and  about  how  to  modify  a  solution  to  improve  its  quality.  We  do  net  claim  lhat  these  rules, 
or  more  generally,  control  knowledge,  will  necessarily  guide  the  planner  to  find  optimal  solutions  iSimon, 
1981 ),  but  that  the  quality  of  the  plans  will  incrementally  improve  with  experience,  as  the  planner  secs  new 
interesting  problems  in  the  domain  and  interacts  with  the  domain  expert. 

Table  2  shows  the  process  we  have  implemented  to  date  in  order  to  acquire  control  knowledge  from  an 
expert.  The  next  subsections  describe  its  steps  in  more  detail. 


1 .  Run  PRODIGY  with  the  current  set  of  control  rules  and  obtain  a  solution  .s',,,  or  alternatively  set  empty. 

2.  If.Sp, 

Show  Sp  to  the  expert. 

Expert  provides  new  solution  .s',  by  modifying  Sp:  adding,  removing,  or  interchanging  plan  operators,  or 
modifying  their  bindings. 

Otherwise  (no  •"  sal  solution  Sp  is  available) 

Expert  provides  completely  new  solution,  .V,. 

3.  Test  .V,.  If  it  solves  the  problem,  continue.  Else  go  back  to  step  2. 

4.  Compute  the  partial  order  P  for  5, . 

5.  Determine  the  set  of  decision  points  DP  in  the  problem  solving  trace  where  conuol  knowledge  is  required  to 
obtain  a  solution  SI  that  satisfies  P 

6.  Run  PRODIGY  stopping  at  each  of  the  points  in  Dl^  and  acquire  control  knowledge  Irom  the  expert  to  make  the 
right  choice. 

7.  If  expert  still  wants  to  improve  the  current  solution  SI, 

Set.Vp  -  S',. 

Go  to  step  2. 

Otherwise  terminate. 

Table  2:  The  Basic  Process  for  Incremental  Control  Knowledge  Acquisition.  The  expert  provides  a  solution 
he  considers  good  either  by  modifying  the  one  proposed  by  Ihe  planner,  or  by  enumerating  all  the  operators. 
Then  the  system  acquires  control  knowledge  that  will  guide  the  planner  towards  the  better  solution. 


5.1.1  Getting  a  Solution  from  the  Expert 

In  step  2  the  expert  provides  prodigy  with  a  solution  that  he  considers  better  than  the  one  PRODIGY  obtains 
with  the  control  knowledge  currently  available  (we  call  the  latter  S,,).  The  purpose  is  to  acquire  control 
knowledge  so  that  should  PRODIGY  .see  the  same  problem  again,  the  better  solution  would  be  obtained.  The 
expert  may  use  Sp  as  a  basis  to  build  .V,.  However  in  some  :ases  having  the  planner  lind  just  one  solution 
(Sp)  is  very  expensive  if  control  knowledge  is  not  sufficient,  and  therefore  we  provide  the  capability  that 
the  expert  can  build  a  solution  (.S',)  from  scratch  to  start  with. 
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In  the  domains  we  have  experimented  with,  solutions  tend  to  be  long  and  the  operators  include  a  lot  of 
parameters.  It  seems  useful  to  provide  the  expert  with  good  uxtls  to  input  a  solution.  The  current  interface 
allows  him  to  build  a  solution  by  putting  together  some  or  all  of  5p  s  operators  and  adding  new  operators,  if 
needed,  by  typing  the  operator  name  and  all  its  bindings.  We  intend  to  extend  the  interface  to  allow  edition 
of  the  old  solution,  propose  default  values  for  the  parameters  in  the  new  operators,  and  facilitate  step-wise 
execution  of  the  plan,  among  other  things. 

Note  that  it  is  possible  that  an  expert  would  prefer  to  solve  the  problem  using  operators  not  yet  in 
the  domain  specification.  The  APPRENTICE  system  [Joseph,  19921  acquires  such  base-level  or  domain 
knowledge  for  PRODIGY. 

5.1.2  Determining  Where  Control  Knowledge  Is  Needed 

In  step  5,  the  system  finds  the  decision  points  where  prodigy’s  current  control  knowledge  needs  to  be 
modified  so  the  expert’s  solution  is  found.  In  fact,  the  planner  searches  for  any  solution  that  satisfies  the 
partial  order  obtained  from  5^.  A  partial  order,  or  panially  ordered  plan,  encodes  a  set  of  solutions,  or 
totally  ordered  plans.  All  of  them  have  the  same  operators,  and  satisfy  a  set  of  ordering  constraints.  One 
step  op,  precedes  another  step  opj  in  the  partial  order  if  and  only  if  o/>,  adds  a  precondition  of  opj,  or  opj 
deletes  a  precondition  of  op,.  The  partial  order  of  a  plan  can  be  obtained  efficiently  [Veloso  et  ai,  19901 
in  negligible  time  compared  to  the  time  needed  to  generate  the  totally  ordered  plan.  We  are  assuming  that 
the  quality  of  the  plan  is  the  same  for  all  the  plans  encoded  in  the  tota-  order,  since  they  all  have  the  same 
operators.  For  example,  if  two  packages  have  to  be  loaded  in  the  same  truck,  and  we  ignore  package  sizes 
and  tmek  capacities,  the  order  in  which  the  packages  are  loaded  is  irrelevant  with  respect  to  plan  quality. 
By  allowing  any  solution  in  the  partial  order,  we  reduce  the  amount  of  control  knowledge  that  needs  to  be 
acquired.  However  this  heuristic  would  not  be  valid  in  domains  where  constraints  on  the  order  of  operators, 
other  than  the  ones  encoded  in  the  partial  order,  influence  the  plan  quality. 

DP  is  the  set  of  decision  points  for  which  control  knowledge  needs  to  be  incorporated.  By  looking  to 
the  trace  for  Sp  and  to  the  expert’s  plan  the  sy.stem  proposes  a  set  of  possible  candidate  points  where 
a  different  decision  should  be  made,  and  the  preferred  decisions  themselves.  PRODIGY  starts  to  solve  the 
problem  again  following  the  first  of  those  recommendations.  If  it  does  not  lead  to  S,.,  it  is  discarded  and 
another  one  tried  in  turn.  For  each  recommendation  mis  process  is  repeated  recursively:  when  the  planner 
realizes  that  the  current  path  will  not  lead  to  a  .solution  that  satisfies  the  partial  order,  the  path  is  abandoned, 
candidates  for  wrong  decision  points  are  found,  and  prodigy  backtracks  to  one  of  those  candidates  and 
tries  a  different  path.  ^  This  process  can  be  seen  as  searching  on  the  space  of  decision  points  in  the  trace, 
until  a  set,  DP,  is  found  that  leads  to  a  solution  .V'. 

5.1.3  Acquiring  the  Relevant  Knowledge 

If  PRODIGY  mns  following  the  recommendations  in  DP,  it  will  obtain  plan  .S'.  Each  recommendation 
contains  a  decision  point,  the  alternative  that  leads  to  .S  ',  and  a  reason  why  that  alternative  should  be  chosen 
(for  example,  an  operator  ordering  would  be  violated  otherwise,  or  a  different  binding  would  be  chosen 
instead  that  the  one  the  expert  proposed).  In  step  6  prodigy  restarts  problem  solving  again  stopping  at  each 
of  those  decision  points,  and  requesting  the  expert’s  advice.  This  advice  is  translated  into  search  control 
mles  that  will  fire  making  the  appropriate  choices.  Note  that  the  alternative  to  take  is  known  at  this  point  and 
it  becomes  part  of  the  mle’s  consequent.  The  rule’s  preconditions,  or  left-hand  side,  encode  the  situations 
in  which  that  decision  should  be  made.  Some  of  the  preconditions  can  be  automatically  extracted  from 
the  current  meta-state.  These  are  the  current  goal,  and  also  the  current  or  if  we  are  dealing  with  a 


K. 


’ah  this  process  is  performed  through  domain-independentrules  speciatly  designe 


bindings  decision.  Also,  if  the  reason  why  the  system  detected  that  a  different  alternative  should  be  taken 
relies  on  other  available  knowledge,  that  knowledge  is  added  to  the  rule.''*  At  this  point  we  need  to  acquire 
from  the  domain  expert  the  knowledge  that  justifies  the  decision.  We  want  to  rely  as  less  as  possible  on  the 
expert’s  knowledge  about  the  planner  itself.  In  particular  it  is  hard  even  for  PRODIGY  experts  to  explain,  lor 
example,  why  the  reason  for  the  inefficient  solution  is  a  goal  ordering  at  a  particular  point  of  the  trace.  On 
the  other  hand  we  need  to  extract  knowledge  that  can  be  operationalized,  i.e,  transformed  into  control  rule 
preconditions  that  match  the  current  meta-state. 

The  approach  followed  here  is  to  get  the  expert’s  knowledge  in  terms  of  the  current  state,  top  level  goals, 
and  possibly  the  pending  goals.  To  extract  this  knowledge,  the  expert  is  asked  for  the  reasons  as  possible 
in  terms  of  the  proposed  solution:  why  he/she  decided  to  use  a  particular  operator  or  to  change  the  order 
of  some  operators  with  respect  to  their  order  in  the  planner’s  solution.  That  information  is  captured  in  the 
recommendation.  The  expert  interacts  by  pointing  to  menu  items. 

Once  the  items  that  will  become  part  of  the  rule  preconditions  are  chosen,  they  need  to  be  generalized. 
At  this  point  generalization  means  simply  to  replace  constants  by  variables.  However  it  is  not  a  trivial 
process.  All  the  objects  have  types  in  PRODIGY  and  therefore  new  preconditions  should  be  added  that 
constrain  the  values  that  variables  may  take.  In  .some  ca.ses  the  values  are  so  specific  that  the  constants  have 
to  remain.  These  constraints  may  be  extracted  from  the  type  hierarchy,  but  in  some  cases  the  expert  may 
decide  to  specify  the  set  of  values  that  the  variable  can  take  as  a  combination  of  different  types  (for  example, 
any  drill  bit  that  is  not  specifically  used  to  make  a  .spot  hole  can  be  expressed  in  PRODIGY’S  language  as 
(comp  DRILL-BIT  SPOT-DRILL)). 

It  is  not  hard  to  imagine  that  the  control  rules  acquired  from  the  expert  in  a  particular  problem  solving 
context  may  be  too  general  or  too  specific.  These  rules  then  provide  inappropriate  advice  when  the  situation 
is  slightly  different. 

5.1.4  Variations  to  the  Algorithm 

The  algorithm  in  Table  2  may  be  modified  to  rely  less  in  the  domain  expert.  Table  3  reflects  ihe.se  changes. 
Instead  of  prompting  the  expert  for  another  solution,  PRODIGY  can  run  in  its  multiple-.solutionmode  until  it 
finds  a  new  solution.  Then  the  system  may  decide  by  itself  if  a  given  solution  is  better  than  other  or  else  it 
can  ask  the  expert  for  his  preference.  Then  the  algorithm  proceeds  as  before. 

We  have  tested  this  variation  in  the  tran.sportation  domain  and  run  into  practical  problems;  PRODIGY 
may  require  an  impractically  large  amount  of  search  before  it  finds  the  next  substantially  better  solution. 
Before  it  backtracks  to  a  point  that  would  lead  to  an  interestingly  different  solution,  it  tries  many  lesser 
variations  of  the  current  one,  by  selecting  different  alternatives  near  the  leaves  of  the  .search  tree.  We  would 
like  to  experiment  with  different  schemas  to  solve  this  difficulty,  such  as  imposing  resource  bounds  (lime 
or  number  of  nodes),  abandoning  a  path  when  the  partial  solution  found  so  far  looks  worse  than  .V,,  ", 
or  trying  backtracking  strategies  different  from  chronological  backtracking.  These  heuristics  seem  very 
domain  dependent  and  we  have  not  explored  them  so  far. 


'"a  clear  example  of  this  deals  with  operator  ordering  (see  example  pghl);  if  opi  has  to  be  applied  before  om  in  the  expert’s 
solution,  and  PRODIGY  has  expanded  o/m  already,  Djyi  cannot  be  expanded  until  opi  has  been  applied,  or  else  tbc  required  ordering 
cannot  be  satisfied.  Therefore  the  condition  that  api  is  expanded  but  not  applied  yet  can  be  automatically  added  to  the  rule  for 
rejecting  o/>2,  or  even  to  a  rule  for  rejecting  the  goal  for  which  opr  is  relevant. 

"However  we  cannot  decide  in  general  that  the  final  solution  in  that  path  is  going  to  be  worse  than  .S',,. 
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1.  Run  PRODIGY  with  the  current  set  of  conuol  rules  and  obtain  a  solution  If  S],  is  good  enough,  terminate. 

2.  Find  next  solution  Sn.  If  no  one  is  found,  terminate. 

3.  Decide  whether  Sn  is  better  than  Sp  using  prior  knowledge  or  asking  of  the  expert. 

4.  If  Sn  is  not  better  than  Sp,  go  back  to  Step  2. 

5.  Determine  the  set  of  decision  points  DP  in  the  problem  solving  trace  where  control  knowledge  is  required  to 
obtain  a  solution  Sn. 

6.  Run  PRODIGY  stopping  at  etich  of  the  points  in  DP  and  acquire  conuol  knowledge  from  the  expert  to  make  the 
right  choice. 

7.  Set  Sp  <—  Sn- 
Go  to  step  2. 


Table  3:  A  Modified  Version  of  the  Basic  Process.  In  this  case  PRODIGY  runs  in  its  multiple  solutions 
mode  and  finds  the  better  solution  by  itself,  instead  of  having  the  human  expert  input  it.  It  needs  however 
knowledge  about  why  a  solution  is  better  than  other. 

5.2  Detailed  Examples 

In  this  section  we  present  two  examples  to  illustrate  the  viability  of  quality  .solution  enhancement  using 
control  rules,  and  the  use  of  the  method  described  in  Section  5.1.  Section  5.2.1  presents  a  simple  example 
in  which  rules  control  the  use  of  resources  in  the  plan  according  to  both  reliability  and  cost.  The  example  in 
Section  5.2.2  illustrates  how  choosing  the  right  goal  interleaving  leads  to  better  plans,  in  particular  shorter 
plans. 


5.2.1  A  Simple  Example:  Choosing  Different  Resources 

In  this  problem  the  goal  is  to  move  packages  from  the  airport  (pgh-airport)  to  the  post  office  (pgh-po), 
and  to  do  this  there  are  two  trucks  available  at  the  airport,  ptl  and  pt2.  In  addition  there  arc  other  facts 
known  about  the  trucks  and  the  kind  of  solution  preferred,  that  are  not  used  by  the  operators.  Figure  6 
summarizes  this  problem. 


INITIAL  STATE;  GOAL  STATEMENT: 

(driver  pt  1  reliable) 

(capacity  pt  1  large) 

(driver  pt2  unreliable) 

(capacity  pt2  small) 

(suategy  resource -cost-min) 

Figure  6:  Initial  State  and  Goal  Statement  for  Problem  cap.  p3  has  to  be  moved  from  the  airport  pgh-ap  to 
the  post-office  po.  Using  truck  pt2  is  cheaper  because  its  capacity  is  smaller,  but  using  truck  ptl  produces 
a  more  reliable  solution. 

The  planner  automatically  comes  up  with  this  solution,  that  uses  ptl; 

<load-truck  packages  ptl  pgh-airport> 

<drive-truck  ptl  pgh-airport  pgh-po> 

<uiiload-truck  packages  ptl  pgh-po> 


Pg 
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but  the  expert  prefers  to  use  truck  pt2  to  solve  this  problem,  since  the  strategy  is  to  reduce  the  resource 
cost.  pt2  is  cheaper  since  its  capacity  is  smaller. 

A  dialog  is  initiated  once  the  planner  solves  the  problem.  First  the  new  solution  is  built  taking  into 
account  what  the  expert  proposes,  use  pt2  instead.  Then  the  knowledge  acquisition  module  queries  the 
expert  in  a  way  that  requires  no  knowledge  of  prodigy  internals: 

Why  is  not  the  solution  good  enough? 

1.  Unnecessary  steps. 

2.  Resources  too  costly. 

3.  Reliability  problem. 

Answer;  2 

This  is  the  initial  solution  obtained  by  the  planner: 

1.  <load-truck  packages  ptl  pgh-airport> 

2.  <drive-truck  ptl  pgh-airport  pgh-po> 

3.  <unload-truck  packages  ptl  pgh-po> 

Which  resource  do  you  wamt  to  replace?  ptl 
New  value;  pt2 

In  which  operators  (optional): 

Testing  the  expert  solution. . . 

The  plan  given  is  a  solution  to  the  current  problem. 

Now  the  system  has  to  find  out  why  pt2  is  a  better  alternative: 

These  are  the  diffs  between  the  two  objects,  ptl  and  pt2: 

1.  (capacity  ptl  large) 

2.  (driver  ptl  reliable) 

3.  (capacity  pt2  small) 

4.  (driver  pt2  unreliable) 

Select  one  or  more:  1  3 

With  this  information,  a  new  control  rule  is  built  that  decides  about  the  resource  to  use: 

Run  problem  again  to  acquire  binding  control  rules , 

2  n2  (done) 

4  n4  <*linish*> 

5  n5  (at-obj  packages  pgh-po) 

There  is  a  new  binding  decision  stored  at  this  node. 

This  is  the  rule  acquired: 

(CONTROL-RULE  ACq32 
(IF 

(AND  (TRUE-IN-STATE  (STRATEGY  RESOURCE-COST-MIN) ) 

(TRUE-IN-STATE  (CAPACITY  <TRUCKl>  LARGE)) 

(TRUE-IN-STATE  (CAPACITY  <TRUCK2>  SMALL)) 

(DIFF  <TRUCK1>  <TRUCK2>))) 

(THEN  PREFER  BINDINGS  ((<TRUCK>  .  <TRUCK2>))  ((<TRUCK>  <TRUrKl>)))) 

Testing  the  control  rule: 

The  Ihs  would  match  with  these  bindings; 
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(((<TRUCK2>  .  #<P-0:  PT2  truck>)  (<TRUCK1>  .  #<P-0;  PTl  truck>))) 


and  this  is  the  result  oi  iiring  the  rule: 

Prefer  ((<TRUCK>  .  #<P-0:  PT2  truck>)) 
over  ((<TRUCK>  .  #<P-0:  PTl  truck>)). 

7  n7  <unload-truck  packages  pt2  pgh-po>  [1] 

8  n8  (inside-truck  packages  pt2) 

10  nlO  <load-truck  packages  pt2  pgh-airport>  [1] 

11  nil  <LOAD-TRUCK  PACKAGES  PT2  PGH-AIRPORT>  [l] 

12  nl2  (at-truck  pt2  pgh-po) 

14  nl4  <drive-truck  pt2  pgh-airport  pgh-po> 

15  nl5  <DRIVE-TRUCK  PT2  PGH-AIRPORT  PGH-P0> 

15  nl6  cUHLOAD-TRUCK  PACKAGES  PT2  PGH-P0> 

Achieved  top-level  goals. 

Solution: 

<load-truck  packages  pt2  pgh-airport> 

<drive-truck  pt2  pgh-airport  pgh-po> 

<unload-truck  packages  pt2  pgh-po> 

Note  that  in  this  example  it  is  enough  with  replacing  the  decision  at  n7.  The  other  binding  decisions  arc 
determined  by  this  one. 

5.2.2  Finding  the  Right  Goal  Interleaving 

In  this  problem  the  key  to  find  a  good  solution  is  on  interleaving  the  work  in  both  top-level  goals.  Plan 
length  is  the  metric  of  plan  quality  considered  (although  in  this  case  the  shorter  solution  makes  in  addition 
a  more  efficient  use  of  the  available  resources).  The  example  shows  the  use  of  the  algorithm  we  presented 
in  Section  5.1.  The  problem  goal  is  to  move  a  package,  pi,  from  pgh-po  to  bos-po.  In  the  initial  state 
there  is  a  truck  at  each  post  office,  and  two  planes  at  pgh-airport.  Figure  7  shows  the  initial  state  and 
goal  statement  for  this  problem. 


INITIAL  STATE:  GOAL  STATEMENT; 


po  ap  po  ap  po  ap  po  ap 


Figure  7:  Initial  State  and  Goal  Statement  for  Problem  pghl.  The  problem  goal  is  to  move  a  package, 
pi,  from  pgh-po  to  bos-po.  In  the  initial  state  there  is  a  truck  at  each  post  office,  and  two  planes  at 
pgh-airport. 

PRODIGY  solves  the  problem,  using  some  default  control  knowledge  and  obtains  an  inefficient  solution 
(I  will  refer  to  this  as  the  planner’s  solution); 

1.  <load-truck  packagel  pgh-truck  pgh-po> 

2.  <f ly-airplane  airplanel  pgh-airport  bo8-airport>  Airplane  flies  unnecessarily. 

3.  <drive-truck  pgh-truck  pgh-po  pgh-airport> 


4.  <f ly-airplane  airplanel  bos-airport  pgh-airport>  Airplane  returns. 

5.  <unload-truck  packagel  pgh-truck  pgh-airport> 

6.  <load-airplcine  packagel  airplanel  pgh-airport>  Once  packagel  is  loaded, 

7.  <f ly-airplcine  airplanel  pgh-airport  bo3-airport>  fly  makes  sense. 

8.  <unload-airplane  packagel  airplanel  bo3-airport>  Package  arrives  to  boston, 

9.  <drive-truck  bos-truck  bos-po  bo3-airport> 

10.  <load-truck  packagel  bos-truck  bo3-airport>  and  goes  by  truck  to  bos-po. 

11.  <drive-truck  bos-truck  bos-airport  bo3-po> 

12.  <unload-truck  packagel  bos-truck  bos-po> 

Note  that  airplainel  flics  unnecessarily.  Now  the  system  presents  to  the  expert  the  solution  obtained  by  the 
planner  and  prompts  him  for  a  new  and  better  solution.  He  may  vary  the  order  of  the  steps,  remove  steps,  or 
add  steps,  or  the  solution  can  be  completely  different.  In  fact,  the  expert  could  pmvide  a  .solution  without 
having  the  planner  solve  the  problem  first.  This  is  useful  when  the  planner  gets  lost  searching  thousands  of 
nodes  because  it  lacks  the  appropriate  control  knowledge.  However  it  is  often  easier  to  critique  or  modify  a 
solution  -  therefore  the  planner  usually  offers  one  to  the  expert  as  a  starting  point.  Once  the  expert  gives  the 
solution  it  considers  better,  PRODIGY  first  tests  it  to  make  sure  that  it  will  actually  .solve  the  given  problem. 

Enter  sequence  of  operator  numbers  with  the  new  ordering  of  steps: 

(put  a  *  for  a  diff  operator)  1356789  10  11  12 

The  expert  solution  is: 

I.  #<L0AD-TRUCK  [<0BJ>  PACKAGE  1]  [STRUCK'  PGH-TRUCK]  [<L0C>  PGH-P0]> 

3.  #<DRIVE-TRUCK  [<TRUCK>  PGH-TRUCK]  [<L'.:C-FR0M>  PGH-PO]  [<L0C-T0>  PGH-AIRP0RT]> 

5.  #<UNL0AD-TRUCK  [<0BJ>  PACKAGEl]  [<TRUCK>  PGH-TRUCK]  [<L0C>  PGH-AIRPORT] > 

6.  #<L0AD-A1RPUNE  [<0BJ>  PACKAGE  1]  [<AIRPLANE>  AIRPLANEl]  [<L0C>  PGH-AIRPORT] > 

7.  #<FLY-AIRPLANE  [<AIRPLANE>  AIRPLANEl]  [<L0C-FR0M>  PGH-AIRPORT]  [<L0C-T0>  BOS-AIRPORT] > 

8.  #<UNLOAP-AIRPLANE  [<0BJ>  PACKAGEl]  [<AIRPLANE>  AIRPLANEl]  [<L0C>  BOS-AIRPORT] > 

9.  #<DRIVE-TRUCK  [<TRUCK>  BOS-TRUCK]  [<L0C-FR0M>  BOS-PO]  [<L0C-T0>  BOS-AIRPORT] > 

10.  #<L0AD-TRUCK  [<0BJ>  PACKAGEl]  [<TRUCK>  BOS-TRUCK]  [<L0C>  BOS-AIRPORT] > 

II.  #<DRIVE-TRUCK  [<TRUCK>  BOS-TRUCK]  [<L0C-FR0M>  BOS-AIRPORT]  [<L0C-T0>  BOS-PO] > 

12.  #<UNL0AD-TRUCK  [<0BJ>  PACKAGE  1]  [<TRUCK>  BOS-TRUCK]  [<L0C>  BOS-PO] > 

Testing  the  expert  solution... 

The  plcui  given  is  a  solution  to  the  current  problem. 

The  system  currently  implemented  detects  the  point(s)  at  which  wrong  decisions  were  made.  It  may 
backtrack  and  try  different  alternatives,  until  it  solves  the  pmblem  obtaining  the  solution  proposed  by  the 
expert.  Actually  it  obtains  a  solution  that  satisfies  the  partial  order  obtained  from  the  expert’s  solution 
by  enforcing  dependencies  between  operator  preconditions  and  previous  operator  effects.  For  example,  it 
assumes  that  if  two  operators  are  not  ordered  in  the  partial  order,  their  order  docs  not  matter.  In  this  case 
airplanel  flics  unnecessarily  due  to  a  wrong  goal  interleaving  after  n26:  achieving  goal  (at-airplane 
airplanel  bos-airporti  should  be  po.stponcd  until  the  package  is  inside  of  the  airplane.  This  was  (part 
oO  prodigy’s  trace; 

5  n5  (at-obj  packagel  bos-po) 

7  n7  <unload-truck  packagel  bos-truck  bos-po> 

8  n8  (inside-truck  packagel  bos-truck) 

10  nlO  <load-truck  packagel  bos-truck  bos-po>  [l]  . . .goal  loop  with  node  5 

10  nil  <load-truck  packagel  bos-truck  bo8-airport> 

11  nl2  (at-obj  packagel  bos-airport) 

13  nl4  <unload-truck  packagel  bos-truck  bo3-airport>  ...goal  loop  with  node  8 

13  nl6  cunload-airplane  packagel  airplcuiel  bos-airport>  [1] 


14  nl7  (inside-airplane  packagel  airplanel) 

16  nl9  <load-airplane  packagel  airplanel  pgh-airport> 

17  n20  (at-obj  packagel  pgh-airport) 

19  n22  <unload-truck  packagel  pgh-truck  pgh-airport> 

20  n23  (inside-truck  packagel  pgh-truck) 

22  n25  <load-truck  packagel  pgh-truck  pgh-po>  [1] 

23  n26  <L0AD-TRUCK  PACKAGEl  PGH-TRUCK  PGH-P0>  [3] 

24  n27  (at-airplame  airplamel  bos-airport)  [2] 

26  n29  <f ly-airplane  airplanel  pgh-airport  bos-airport> 

27  n30  <FLY-AIRPLANE  AIRPLANEl  PGH-AIRPORT  B0S-AIRP0RT>  [2] 

The  system  points  the  erucial  decision  in  the  current  example,  namely: 

—  Wrong  decision  made  after  node 

#<APPLIED-0P-M0DE  26  #<L0AD-TRUCK  [<0BJ>  PACKAGEl]  [<TRUCK>  PGH-TRUCK]  [<L0C>  PGH-P0j». 

Goal  #<AT-AIRPLANE  AIRPLAMEl  B0S-AIRP0RT>  was  chosen. 

Goal(s)  #<AT-TRUCK  PGH-TRUCK  PGH-AIRP0RT>  could  have  been  chosen  instead. 

To  avoid  making  the  wrong  deeision,  PRODIGY  has  to  acquire  control  knowledge.  Our  goal  is  to  extract 
from  the  expert  the  relevant  knowledge  and  express  it  in  the  form  of  eontrol  rules.  First,  this  is  the  rule  1 
came  up  with  when  writing  by  hand  the  control  rule  set: 

(control-rule  HAMD-CODED  ;;  reject  moving  the  airplane  if  loading  is  needed 
(if  (and  (candidate-goal  (at-airplane  <air['leme>  <neu-loc>)) 

(known  (at-airplane  <airpleme>  <loc>)) 

(expanded-operator  (LOAD-AIRPLAME  <obj>  <airplane>  <loc>)))) 

(then  reject  goal  (at-airplane  <airplane>  <new-loc>))) 


We  want  to  rely  as  less  as  possible  on  the  expert’s  knowledge  about  PRODIGY.  In  particular  it  is  hard  even 
for  PRODIGY  experts  to  explran  why  the  reason  for  the  inefficient  solution  is  a  goal  ordering  at  a  particular 
point  of  the  trace.  The  code  that  detects  the  wrong  decision  point  would  help  by  itself  quite  a  lot  even  to 
PRODIGY  experts. 

The  approach  followed  here  is  to  get  the  expert’s  knowledge  in  terms  of  the  current  state,  top  level 
goals,  and  possibly  the  pending  goals.  To  extract  this  knowledge,  the  expert  is  asked  for  the  reasons  why 
he  decided  to  use  a  particular  operator  or  to  change  the  order  of  some  operators  with  respect  to  their  order 
in  the  planner’s  .solution.  This  is  the  rule  that  could  be  learned  for  this  problem  based  on  that  information: 


(control-rule  TEST2 

(if  (and  (camdidate-goal  (at-airplane  <airplane>  <dest-airport>) ) 

(expanded-operator  (load-airplaine  <obj>  <airplcme>  <orig-airport>) ) 

(diff  <dest-airport>  <orig-airport>) 

(expamded-goal  (at-obj  <obj>  <dest-po>)) 

(known  (inside-truck  <obj>  <truck>)) 

(known  (at-truck  <truck>  <orig-po>)) 

(known  (at-airplane  <airplame>  <orig-airport>) ) 

(same-city  <orig-airport>  <orig-po>))) 

(then  reject  goal  (at-airplane  <airplane>  <dest-airport>) ) ) 

The  first  three  meta-predicates  are  obtained  from  the  current  meta-state.  In  particular,  as  the  operator 
ordering  violated  at  the  node  was  that  <f  ly-airplane  airplanel  pgh-airport  bos-airport>  should 
be  applied  after  <load-airplane  packagel  airplanel  pgh-airport>,  once  load-airplane  has 


been  chosen,  at-airplane  cannot  be  expanded  until  load-airplane  is  applied  so  the  order  is  not 
violated.  This  gives  us  the  first  three  preconditions  and  also  allows  us  to  write  a  goal  reject  rule  (instead 
of  a  goal  preference  rule). 

The  rest  of  the  preconditions  come  from  the  information  extracted  from  the  expert.  He  was  presented 
with  the  current  state,  and  the  top-level  goals,  namely; 

THE  CURRENT  STATE  IS: 

•(inside-truck  packagel  pgh-truck) 

*(at-airplane  airplcinel  pgh-airport) 

•(at-truck  pgh-truck  pgh-po) 

(part-of  bos-truck  boston) 

(loc-at  pgh-po  Pittsburgh) 

(loc-at  bos-po  boston) 

(seune-city  bos-po  bos-airport) 

(same-city  pgh-airport  pgh-po) 

THE  TOP  LEVEL  GOALS  ARE: 

•(at-obj  packagel  bos-po) 

The  items  marked  with  *  are  the  ones  the  expert  picked,  and  that  appear  in  the  rule  alter  generalization. 

Obviously  the  state  can  be  very  large  with  many  irrelevant  features.  Remember  that  we  are  asking  the 
expert  why  he  preferred  a  given  operator,  or  a  given  ordering  between  two  operators.  In  order  to  focus  on 
the  relevant  state  features,  we  could  make  use  of  these  operators. 

Note  that  the  rule  TEST2  has  more  conditions  that  HAND-CODED,  the  one  1  wrote  initially  by  hand.  Note 
also  that  HAND-CODED  is  more  general  (it  applies  to  more  situations).  We  will  have  to  analyze  if  there  is  a 
way  to  get  rid  automatically  of  some  of  the  preconditions  of  TEST2. 

The  goal  mentioned  in  (expanded-goal  (at-obj  <obj>  <dest-po>))  is  an  ancestror  in  the  sub- 

goaling  structure  of  the  operator  mentioned  in  the  second  precondition  <  load- airplane  <obj  >  <airplane> 

<orig-airport>>.  But  if  we  remove  it  the  rule  may  be  ovcrgcneral;  it  would  lire  also  if  (at-obj  <obj> 
<dest-po>)  is  not  lui  expanded  goal,  for  example  when  the  top  level  goal  is  (inside-airplane  <obj> 
<airplane>).  The  generalization  may  be  good,  and  in  addition  we  could  ask  the  user  to  guide  it: 

You  mentioned  that  a  relevant  goal  was: 

(at-obj  packagel  bos-po) 

Is  it  ok  il  the  expanded  goal  is  instead  one  of  these: 

(inside-truck  packagel  bos-truck) 

(at-obj  packagel  bos-airport) 

(inside-airplane  packagel  airplanel) 


(at-airplane  airplaneZ  pgh-airport) 
(at-truck  bos-truck  bos-po) 

(part-of  pgh-truck  Pittsburgh) 
(loc-at  pgh-airport  Pittsburgh) 
(loc-at  bos-airport  boston) 
♦(same-city  pgh-po  pgh-airport) 
(same-city  bos-airport  bos-po) 


where  the  goals  were  obtained  by  going  up  the  search  tree.  In  this  case  the  an.swer  is  yes,  because  the 
only  thing  that  matters  is  that  load-airplane,  already  expanded,  should  be  applied  before  working  on 
fly-airplane. 

5.3  Assumptions  and  Limitations 

The  approach  presented  here  makes  some  assumptions  about  the  way  control  knowledge  is  available  and 
can  be  expressed  to  guide  the  problem  solver.  First,  we  assume  that  global  strategies  to  obtain  good  plans 
can  be  encoded  as  control  rules  that  guide  local  decisions.  Also  we  arc  assuming  that  an  optimal  decision 
is  not  required  at  each  point,  but  rather  that  a  qualitative  model  of  the  utility  of  the  actions  can  be  acquired 
and  expressed  as  a  set  of  control  mles.  A  third  assumption  is  that  the  factual  knowledge,  or  domain 
knowledge,  is  correct  and  complete.  We  only  worry  about  acquiring  search  control  knowledge.  Other  work 


I 


on  the  PRODIGY  architecture  (Carbonell  and  Gil,  1990,  Joseph,  19921  has  focused  on  the  acquisition  and 
refinement  of  domain  knowledge.  Lastly,  this  model  assumes  that  experts  can  give  reasons  for  specific 
strategic  decisions,  rather  than  simply  intuit  correct  sequences  of  decisions. 


6  Learning  Quality-Enhancing  Control  Rules 

This  section  presents  the  work  currently  in  progress  for  the  automatic  acquisition  of  quality-enhancing 
control  knowledge.  Our  goal  is  to  have  a  system  that  improves  over  experience  the  quality  of  the  plans  it 
generates  by  learning  control  knowledge  to  guide  the  search.  Figure  8  shows  the  architecture  of  the  current 
system.  The  system  is  given  a  domain  theory  (operators  and  inference  rules)  and  a  domain-dependent 
objective  function  that  describes  the  quality  of  the  plans.  It  is  also  given  problems  to  solve  in  that  domain. 
The  system  learns  control  knowledge  by  analyzing  the  problem-solving  episodes,  in  particular  comparing 
the  search  trace  for  the  planner  solution  given  the  current  control  knowledge,  and  another  search  trace 
corresponding  to  a  better  solution  (better  according  to  the  evaluation  function).  The  latter  search  trace 
is  obtained  by  letting  the  proolem  solver  .search  further  until  a  better  solution  is  found,  or  by  ask.ng  a 
human  expert  for  a  better  solution  and  then  producing  a  search  trace  that  leads  to  that  solution.  The  control 
knowledge  learned  in  this  way  leads  future  problem  solving  towards  better  quality  plans.  In  .some  sense 
there  is  a  change  of  representation  from  the  knowledge  about  quality  encoded  on  the  objective  function  into 
knowledge  that  the  planner  may  use  at  problem  .solving  time.  We  do  not  claim  that  this  control  knowledge 
will  necessarily  guide  tfie  planner  to  find  optimal  solutions,  but  that  the  quality  of  the  plans  will  incrementally 
improve  with  experience,  as  the  plaruier  sees  new  interesting  problems  in  the  domain. 


Problems 


Control  Knowledge 
for  better  plans 

Control  Knowledge 
for  faster  planning 


Figure  8;  Architecture  of  a  system  to  learn  control  knowledge  to  improve  the  quality  of  plans. 

Section  2  presented  a  taxonomy  of  quality  metrics.  A  goal  of  our  work  is  to  validate  and  expand  that 
taxonomy.  We  are  exploring  in  more  depth  some  parts  of  the  taxonomy,  giving  precise  definitions  of  quality 
metrics  for  them.  In  particular  we  focus  on  reducing  the  execution  cost  of  the  plans  generated,  taking  cost 
as  sum  of  the  execution  cost  of  plan  operators.  We  also  expect  to  explore  the  pos.sibility  of  using  oases  as 
control  knowledge  and  of  derivational  analogy  to  extract  and  store  those  cases.  Currently  the  derivational 
analogy  module  of  PRODIGY  (Veloso,  19921  stores  past  problem-solving  experiences  as  cases,  and  reuses 
them  to  solve  similar  problems,  obtaining  a  considerable  improvement  in  problem  solving  efficiency.  The 
use  of  one  or  more  past  cases  to  solve  the  current  problem  may  lead  to  shorter  plans,  as  reported  in  I  Veloso, 
19921.  This  was  a  surprising  result  but  not  the  focus  of  the  work.  Note  in  fact  that  if  the  .solution  stored  to 
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solve  a  pmblcm  was  not  a  good  one,  it  may  be  reused  to  solve  subsequent  problems  without  trying  to  find 
a  better  solution.  The  use  of  this  approach  to  learn  quality-enhancing  control  rule  may  affect  the  similarity 
metric  to  find  relevant  cases,  and  the  language  to  expre.ss  the  ju.stifications,  in  order  to  capture  guidance 
about  why  the  proposed  solution  is  good. 

We  plan  to  evaluate  this  work  at  each  phase  by  testing  empirically  how  plan  quality  improves  over 
experience,  as  the  number  of  problems  seen  by  the  system  increases.  The  evaluation  will  be  done  for  two 
complex  domains,  namely  those  presented  in  Section  4. 1  with  sets  of  randomly  generated  problems  in  each 
domain.  We  also  want  to  study  if  the  addition  of  quality-enhancing  control  knowledge  degrades  problem 
solving  efficiency,  enhances  it,  or  is  orthogonal.  If  the  lirst  alternative  is  true,  we  need  to  establish  trade 
offs  between  planning  efficiency  and  plan  execution  efficiency. 

7  Related  Work 

Although  there  have  been  a  number  of  systems  that  learn  control  knowledge  for  planning  systems,  most  of 
them  are  oriented  towards  improving  .search  efficiency.  There  has  not  been  much  rc.scarch  done  on  cither 
improving  the  quality  of  plans  or  acquiring  control  knowledge  from  human  experts.  In  this  section  we 
present  some  work  on  planning  systems  that  worry  about  solution  quality,  and  al.so  on  systems  that  acquire 
control  knowledge. 

7.1  Work  on  Planning  Systems  and  Plan  Quality 

Several  domain-dependent  planners  for  the  process  planning  domain  deal  with  the  quality  of  plans.  Hayes' 
MACHINIST  program  [Hayes,  19901  generates  plans  in  a  machining  process  planning  domain.  In  this 
work  the  measure  of  plan  quality  is  solution  length.  Huni.^i  machinists  often  spend  a  large  amount  of  time 
in  the  early  planning  stages  looking  over  the  part  specification  for  feature  interactions  and  exploring  the 
limitations  that  those  features  impose  on  the  plan.  Machinists  have  spcciali/cd  knowledge  which  helps 
them  to  quickly  focus  on  the  situations  in  which  interactions  arc  likely.  This  knowledge  is  acquired  thntugh 
experience.  Hayes  analyzed  the  way  features  interact  and  encoded  this  specialized  knowledge  in  form  of 
rules  in  the  MACHINIST  program.  This  prograru  first  constructs  a  plan  that  deals  with  tcature  interactions, 
and  retrieves  from  memory  a  plan  to  square  the  part  (squaring  is  getting  the  raw  material  into  a  square  and 
arturiUi'.  %hafc  with  the  minimunii  waste  oJ'  m.Tlerial  L  Then  ihcse  iwm  oluns  aa"  r>ki,Tgi,v.l  m  pmtiutp  a  'Uu' 
plan  as  short  as  possible. 

SIPP  [Nau  and  Chang,  19851  is  a  process  planning  system  that  produces  plans  for  the  creation  of  metal 
parts.  It  utilizes  a  frame  hierarchy  to  rcprc.scnt  problem  solving  knowledge.  In  particular,  actions  have  cost 
slots  that  contain  relative  costs  derived  from  actual  process  costs  and  .shop  preferences.  The  problem  solving 
.strategy  utilizes  a  least-cost-first  branch  and  bound  algorithm  to  find  the  least-cost  sequence  of  processes 
for  making  each  of  the  part’s  machinable  surfaces.  SIPP  selects  the  least  cost  manufacturing  method  for  an 
individual  feature,  in  isolation  from  considerations  about  other  features,  and  therefore  it  docs  not  care  to 
find  an  overall  kiw  <osi  plam.  SIPI’  is  dt^main’dependem  aUhtuigh  they  anticipate  it  will  be  useful  in  ruher 
domains  as  well. 

Descottc  and  Latombe’s  planner  [  Dcscottc  and  Latombe,  19851  is  a  process  planner  for  mct;il  cutting  that 
u.w.s  a  CTUi'ti.mml  ■uiti't&clkti  algcrintim  Kn.rtvlcdgv  in  rqwjtenied  by  mtinjjfai:iurrig  tlie  kil-haml 
side  consists  of  conditions  about  the  desired  part,  the  available  machines,  and/or  the  machining  plan.  The 
right-hand  side  contains  pieces  of  advice  representing  technological  and  economical  preferences,  and  .so  it 
encodes  knowledge  about  the  quality  of  the  plans.  In  contrast  with  PRODIOY,  there  is  no  separation  in  the 
tepte.sentation  between  domain  knowledge  and  search  control  knowledge.  Experts  have  to  assign  a  weight 
to  each  piece  of  advice  according  to  the  importance  of  its  satisfaction.  These  weights  ate  an  extremely 


condensed  rcprcsenialion  of  a  large  bodv  of  knowledge,  and  human  experts  have  difficulty  in  cxpliciling 
them.  The  initial  state  of  a  problem  contains  global  pieces  of  information  such  as  the  quality  desired  for  the 
part.  As  far  as  we  know  this  system  has  not  been  applied  to  other  domains. 

ISIS  [Fox  and  Smith,  1984]  is  a  constraint-directed  factory  scheduling  system.  The  conflicting  nature 
of  constraints  in  this  domain  prompted  the  introduction  of  constraint  relaxation  techniques.  The  problem 
solving  strategy  finds  a  solution  that  best  satisfies  the  constraints.  Some  of  the  constraints  arc  physical 
and  must  be  satisfied,  while  others  can  be  seen  as  approximations  of  a  simple  profit  constraint.  Different 
constraints  have  different  importance  or  priority,  that  may  change  from  order  to  order,  ISIS  provides 
tools  with  which  the  user  can  construct  and  alter  schedules  interactively,  at  different  levels  of  abstraction 
(ISIS  automatically  fills  the  other  levels).  The  tools  arc  in  charge  of  maintaining  the  consistency  of  the 
schedule  and  identify  decisions  that  result  in  poorly  satisfied  constraints.  However  they  do  not  facilitate  the 
acquisition  of  new  knowledge. 

SIPE  [Wilkins,  1988]  is  a  domain-independent  classical  planner  that  rc;»';ons  with  partially  ordered 
plans.  It  incorporates  reasoning  about  resources,  and  interleaving  of  planning  and  execution.  SIPE  has  been 
applied  to  the  scheduling  of  packaging  lines  at  a  brewery  [Wilkins,  19891,  The  problem  is  to  generate  plans 
that  meet  as  many  orders  as  possible,  while  meeting  all  the  physical  constraints,  and  to  minimize  the  waste 
from  flushing  lines.  Domain-specific  knowledge  about  search  control  and  the  utility  of  the  plans,  such  as 
why  flushing  is/is  not  needed,  is  encoded  in  the  operators.  To  take  advantage  of  existing  connections  to 
avoid  flushing  SIPE  relies  on  the  preference  order  given  to  the  operators.  Therefore  there  is  no  separation 
between  domain  knowledge  and  control  knowledge.  According  to  Wilkins  it  would  be  straightforward  to 
implement  a  best-first  search  using  SIPE’s  context  mechanism  to  find  optimal  plans,  if  a  good  measure  ol 
plan  utility  is  provided  by  someone,  but  that  is  true  al.so  for  PRODIGY,  SIPE  offers  a  domain-independent 
graphical  interface  to  view  the  partial  plans  produced  as  graphs.  This  interface  allows  interactive  control 
of  the  search  by  letting  the  user  watch  and,  when  desired,  guide  and/or  control  the  planning  or  replanning 
process.  This  is  useful  for  debugging  and  to  guide  the  solution  of  larger  problems  that  would  not  be  solved 
in  a  reasonable  amount  of  time.  However,  as  in  the  case  of  ISIS,  this  interaction  mechanism  docs  not 
facilitate  explicitly  the  acquisition  of  new  knowledge, 

SteppingStone  [Ruby  and  Kiblcr,  1990!  hcuristically  decomposes  a  problem  into  simpler  subproblcms, 
and  then  learns  to  deal  with  the  interactions  that  arise  between  the  subproblcms.  The  system  allows 
hard-constraints,  that  must  be  met  and  usually  outline  key  aspects  of  the  problem,  and  soft  constraints 
that  measure  the  quality  of  a  solution,  and  arc  usually  real-valued.  The  system  learns  to  optimize  soft 
constraints  as  well  as  solve  hard  constraints.  Prtoblem  .solving  is  viewed  as  moving  to  states  where  the  goal 
is  successively  closer  to  completion.  EASe  [Ruby  and  Kiblcr,  19921  is  a  generalization  of  SteppingStone, 
in  which  additional  problem  solving  knowledge  is  learned  and  stored  in  the  form  of  episodes  (or  eases). 
These  episodes  encode  exceptions  to  the  core  knowledge.  EASe  has  been  applied  to  design  problems  in 
which  an  initial  solution  obtained  by  an  application  system  is  improved  by  using  the  episodes. 

In  Section  3  we  analyzed  the  relationship  between  goal  interactions  and  plan  quality.  Wilcnsky’s 
planner  [Wilensky,  19831  takes  advantage  of  this  relatioaship.  He  makes  an  analysis  of  the  different  types 
of  goal  interactions  and  develops  meta-planning  mechanisms  that  deal  with  them.  When  a  goal  overlap, 
or  positive  goal  interaction  between  a  planner’s  goals,  occurs,  his  planner  is  able  to  eany'  out  an  action 
that  is  in  the  .service  of  a  number  of  goals  at  once.  This  might  involve  executing  a  single  plan  that 
simultaneously  fulfills  .several  goals,  achieving  a  goal  that  serves  more  than  one  purpose,  or  employing 
a  plan  that  is  worthwhile  only  when  a  sufficient  number  of  similar  goals  is  involved.  External  positive 
goal  interactions  may  also  occur  in  which  two  or  more  planners  have  similar  goals  (they  are  termed  }>oal 
concord).  In  Wilensky’s  system  then  a  goal  overlap  situation  provides  an  opportunity  to  achieve  goals 
more  economically  than  they  could  be  achieved  otherwise,  and  the  planner  prefers  efficient  plans  over 
inefficient  ones.  This  principle  would  appear  to  be  the  underlying  justification  for  a  number  of  processes 


incorporated  inolher  planning  systems.  For  example,  several  of  NOAH’s  critics  ISacerdoti,  1977]  including 
“use  existing  objects,”  ‘‘eliminate  redundant  preconditions,”  Jid  ‘‘optimize  disjuncts”  are  motivated  by  this 
idea  and  correspond  to  particular  kinds  of  goal  overlap  situations. 

Some  existing  domain-independent  planning  systems  solve  multiple-goal  problems  by  developing  sep¬ 
arate  plans  for  the  individual  goals,  combining  these  plans  to  form  a  naive  plan  for  the  conjoined  goal,  and 
then  performing  optimizations  to  yield  a  better  combined  plan  [Nau  et  al.,  19901.  However  they  restrict  the 
types  of  goal  interactions  that  may  happen.  In  this  context,  the  quality  of  a  plan  is  only  considered  as  far  as 
dealing  with  and  taking  as  much  advantage  as  possible  of  goal  interactions.  A  similar  mechanism  is  also 
used  by  some  domain-dependent  planners  [Hayes,  1990,  Nau,  19871. 

Several  systems  perform  plan  debugging  as  their  pmblem  "olving  strategy  [Sussman,  1975,  Hammond, 
1987,  Simmons,  19881.  They  employ  heuristic  rules  to  generate  an  initial  hypothesis  and  then  debugging  if 
the  hypothesis  is  incorrect.  Therefore  they  fix  planning  failures  (not  execution  failures).  An  example  is  the 
Generate,  Test  and  Debug  paradigm  [Simmons,  19881  in  which  the  debugger  analyzes  causal  explanations 
for  why  a  bug  arises  and  fixes  it  by  replacing  those  as.sumptions.  The  debugger  is  only  used  if  the  heuristic 
generator  produces  an  incorrect  hypothesis.  In  contrast  our  planner  generates  correct  plans  and  our  goal  is 
not  to  fix  them  but  to  improve  their  quality.  Our  approach  docs  not  perform  post-facto  modification  of  the 
plans,  but  analyzes  the  problem  solving  process  to  extract  knowledge  that  will  guide  the  problem  solver 
towards  better  solutions. 

The  problem  of  finding  optimal  plans  has  been  attacked  by  decision  theorists.  However  this  problem  is 
computationally  very  expensive.  Simon  introduced  the  idea  of  “satisficing”  [Simon,  1981 1  arguing  that  a 
rational  agent  does  not  always  have  the  resources  to  determine  what  the  optimal  action  is,  and  instead  should 
attempt  only  to  make  good  enough,  to  satisfice.  Some  current  work  on  planning  (for  example  [Pollack. 
19921)  is  about  the  tradeoff  between  getting  around  to  acting,  and  spending  enough  time  thinking.  Such 
resource-bounded  reasoning  leads  to  suboptimal  behavior.  In  our  work  we  do  not  consider  the  tradeoff 
between  acting  and  planning  time.  We  acknowledge  the  computational  cost  of  finding  the  optimal  behavior 
and  do  not  claim  that  the  acquired  control  knowledge  will  necessarily  guide  the  planner  to  optimal  plans,  but 
that  plan  quality  will  improve  incrementally  over  experience  as  the  planner  sees  new  interesting  problems 
and  interacts  with  the  huinan  expert.  The  framework  presented  in  [Feldman  and  Spmull,  1977 1  incorporates 
decision  theory  into  planning.  The  utility  function  of  decision  theory  is  u.sed  to  dec.de  which  strategy  to  use 
to  achieve  a  goal,  taking  into  account  such  factors  as  reliability,  the  complexity  of  the  strategy,  and  the  value 
of  the  goal.  It  allows  also  to  improve  an  existing  plan  prior  to  its  execution  to  increment  its  utility.  The 
costs  and  utilities  of  each  plan  step  has  to  be  expressed  as  numbers,  and  these  numbers  have  to  be  obtained 
in  some  way.  We  believe  that  this  limits  the  range  of  quality  measures  that  can  be  expres.sed.  In  addition,  it 
seems  hard  to  encode  the  expert  knowledge  into  these  values.  On  the  other  hand  in  this  framework  it  is  easier 
to  deal  with  conflicts  among  measures.  There  is  also  a  body  of  recent  work  on  methods  to  choose  and/or 
learn  optimal  policies  of  action.  Some  examples  are  reinforcement  learning  (e.g.  [Lin,  19921),  dynamic 
programming,  and  real-time  .1'  [Korf,  19881,  However  these  methods  have  been  applied  to  more  reactive 
models  and  not  so  much  to  solve  complex  planning  problems. 

The  complexity  of  the  piobleiu  of  finding  optimal  solutions  in  the  bltx:ks  world  is  aitaly/.ed  in  iGupta 
and  Nau,  19911  and  (Chenoweth,  1991 1.  In  both  cases  optimal  solutions  are  shortest-length  plans.  [Korf, 
1985^1  iralyzrt  how  the  uwr  of  ■n'SiTMJprKKori  '■  siUvirig  ilTfutr  aulutii in  quality  ’’nit  only  fmni 

the  point  of  view  of  solution  length.  In  particular  in  the  Eight  Puzzle  and  the  3x3x3  Rubik’s  Cube  domains 
the  solution  lengths  are  approximately  equal  to  or  less  than  those  of  human  strategies. 


7.2  Work  on  Acquisition  of  Control  Knowledge 

ASK  [Gruber,  19891  is  probably  the  piece  of  work  on  knowledge  acquisition  most  related  to  ours.  ASK 
is  an  interactive  knowledge  acquisition  tool  that  elicits  .strategic  knowledge  from  people  in  the  form  of 
justifications  for  action  choices,  and  generates  strategy  rules  that  operationalize  and  generalize  the  expert’s 
advice.  Gruber  distinguishes  between  control  knowledge  and  strategic  knowledge.  Control  knowledge 
refers  to  knowledge  used  to  decide  what  to  do  next.  Strategic  knowledge  is  a  subset  of  control  knowledge, 
and  it  is  used  by  an  agent  to  decide  what  action  to  perform  next,  when  actions  have  consequences  external 
to  the  agent.  Search-control  knowledge  is  used  to  choose  internal  actions  that  increase  the  likelihood  of 
reaching  a  solution  state  and  improve  the  speed  of  computation. 

ASK  is  integrated  with  the  MU  architecture,  used  typically  for  heuristic  classification.  MU  organizes 
the  factual  knowledge  as  a  symbolic  inference  network,  where  inferences  are  propagated  from  evidence 
to  hypotheses  by  local  combination  functions.  ASK  acquires  strategy  rules,  in.spired  by  the  meta-rules  in 
NEOMYCIN,  that  map  strategic  situations  to  sets  of  recommended  actions.  There  arc  three  categories 
of  strategy  rules;  focus  rules  (propose  a  .set  of  possible  actions),  filter  rules  (prune  actions  that  violate 
constraints),  and  selection  rules  (prefer  a  subset  of  remaining  actions).  These  rules  are  similar  to  PRODIGY’S 
select,  reject  and  prefer  control  rules.  ASK’s  knowledge  acqui.sition  process  has  five  steps: 

1 .  eliciting  the  user’s  critique  by  presenting  a  list  of  chosen  actions  and  ask  what  should  have  been  done 
differently. 

2.  credit  assignment  analysis:  check  how  existing  rules  matched  and  determine  the  requirements  for  a 
new  rule. 

3.  eliciting  justifications,  by  displaying  features  of  the  knowledge  base  and  allowing  the  user  to  choose 
some.  New  features  can  be  acquired  if  needed  if  they  are  analog  to  existing  features. 

4.  generating  and  generalizing  a  strategy  rule,  possibly  with  user’s  guidance  in  the  generalization  process. 

5.  verifying  a  rule,  by  presenting  the  u.ser  with  a  paraphrased  description  of  the  ru'c  and  its  consequences. 

ASK’s  success  relics  on  the  relevant  control  features  being  defined  in  advance  or  arc  analogous  to 
existing  ones,  and  on  the  user  understanding  of  the  opportunistic  control  model  that  underlies  the  strategy- 
rule  representation.  These  limitations  arc  somehow  shared  by  our  approach.  However  due  to  ASK’s  control 
model  it  is  awkward  to  acquire  goal  directed  strategics;  ASK’s  planning  method  is  purely  reactive  with  no 
projection  (lookahead)  and  no  possibility  to  undo  actions.  In  addition  ASK  does  not  focus  on  plan  quality. 

The  system  presented  in  [Golding  et  al.,  1987 1  acquires  general  search-control  knowledge  for  Soar  from 
a  human  advisor.  In  general  when  Soar  has  to  decide  among  several  courses  of  action  it  reaches  an  impasse 
and  generates  automatically  a  subgoal  and  searches  to  solve  it.  Instead  this  system  requests  advice  from  a 
human  to  solve  an  impasse.  This  request  can  take  one  of  two  forms.  In  the  first  mode,  direct  advice,  the 
advisor  tells  Soar  which  alternative  to  select.  There  is  no  operationalization  problem  as  the  system  forces 
the  expert  to  name  a  particular  operator.  In  the  second  mode  the  advisor  supplies  a  problem  within  Soar's 
grasp  that  illustrates  what  to  do.  In  both  cases,  after  the  problem  is  solved.  Soar’s  learning  mechanism, 
chunking,  transforms  the  advice  in  search-control  knowledge.  Chunking  takes  care  of  the  generalization. 
The  system  may  learn  from  failure  if  the  advice  is  incorrect.  There  is  no  clear  distinction  between  the 
knowledge  acquired  to  reduce  search  (by  avoiding  subgoalingto  solve  an  impas.se)  and  the  knowledge  that 
guides  search  to  a  particular  solution.  Chunking  captures  the  reasons  why  the  advice  is  correct  but  it  is  not 
clear  how  it  may  justify  why  one  path  is  better  than  other  in  terms  of  the  quality  of  the  solution. 


7.3  Other  Work  on  Knowledge  Acquisition  for  prodigy 

Section  2.6.1  presented  briefly  the  work  done  in  the  context  of  the  PRODIGY  architecture  to  speed  up  the 
problem  solving  process.  Our  work  does  not  focus  on  speed-up  learning,  but  on  improving  PRODIGY’S  solu¬ 
tion  quality.  In  addition  these  systems  are  fully  automated  and  acquire  control  knowledge  by  introspection, 
while  we  interact  with  a  external  source,  the  human  expert. 

Other  work  in  our  project  has  focused  on  the  acquisition  of  factual,  or  domain,  knowledge.  APPREN¬ 
TICE  [Joseph,  1992]  performs  knowledge  acquisition  of  domain  knowledge  through  a  graphical  interface.  It 
provides  also  tools  to  view  plan  execution.  Work  on  learning  by  experimentation  iCarbonell  and  Gil,  1990, 
Gil,  1992]  focus  on  the  automatic  refinement  of  incomplete  operators  by  performing  directed  experimenta¬ 
tion  on  the  environment.  However  none  of  this  work  deals  with  the  problem  of  acquiring  control  knowledge 
that  improves  the  quality  of  PRODIGY’S  solutions  with  experience. 

8  Conclusion 

In  this  section  we  present  the  expected  contributions  of  this  work.  This  is  the  first  piece  of  work  that  explores 
and  taxonomizes  quality  metrics  for  planning  sy.stcms.  The  analysis  is  taken  into  practice  in  the  form  ol 
control  knowledge  that  guides  the  planner’s  search  towards  greater  quality  solutions  according  to  a  given 
plan  evaluation  function.  This  knowledge  can  be  learned  from  the  system’s  problem  solving  experience. 
We  believe  that  both  knowledge  about  plan  quality  and  its  automated  acquisition  frem  problem  .solving 
experience  are  key  factors  lor  planning  systems  that  move  towards  more  complex  and  realistic  domains. 
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